 things over the
network; you would need to use the more system-specific C<fcntl> for
that.  If you like you can force Perl to ignore your system's flock(2)
function, and so provide its own fcntl(2)-based emulation, by passing
the switch C<-Ud_flock> to the F<Configure> program when you configure
perl.

Here's a mailbox appender for BSD systems.

    use Fcntl ':flock'; # import LOCK_* constants

    sub lock {
	flock(MBOX,LOCK_EX);
	# and, in case someone appended
	# while we were waiting...
	seek(MBOX, 0, 2);
    }

    sub unlock {
	flock(MBOX,LOCK_UN);
    }

    open(MBOX, ">>/usr/spool/mail/$ENV{'USER'}")
	    or die "Can't open mailbox: $!";

    lock();
    print MBOX $msg,"\n\n";
    unlock();

On systems that support a real flock(), locks are inherited across fork()
calls, whereas those that must resort to the more capricious fcntl()
function lose the locks, making it harder to write servers.

See also L<DB_File> for other flock() examples.

=item fork

Does a fork(2) system call to create a new process running the
same program at the same point.  It returns the child pid to the
parent process, C<0> to the child process, or C<undef> if the fork is
unsuccessful.  File descriptors (and sometimes locks on those descriptors)
are shared, while everything else is copied.  On most systems supporting
fork(), great care has gone into making it extremely efficient (for
example, using copy-on-write technology on data pages), making it the
dominant paradigm for multitasking over the last few decades.

Beginning with v5.6.0, Perl will attempt to flush all files opened for
output before forking the child process, but this may not be supported
on some platforms (see L<perlport>).  To be safe, you may need to set
C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
C<IO::Handle> on any open handles in order to avoid duplicate output.

If you C<fork> without ever waiting on your children, you will
accumulate zombies.  On some systems, you can avoid this by setting
C<$SIG{CHLD}> to C<"IGNORE">.  See also L<perlipc> for more examples of
forking and reaping moribund children.

Note that if your forked child inherits system file descriptors like
STDIN and STDOUT that are actually connected by a pipe or socket, even
if you exit, then the remote server (such as, say, a CGI script or a
backgrounded job launched from a remote shell) won't think you're done.
You should reopen those to F</dev/null> if it's any issue.

=item format

Declare a picture format for use by the C<write> function.  For
example:

    format Something =
	Test: @<<<<<<<< @||||| @>>>>>
	      $str,     $%,    '$' . int($num)
    .

    $str = "widget";
    $num = $cost/$quantity;
    $~ = 'Something';
    write;

See L<perlform> for many details and examples.

=item formline PICTURE,LIST

This is an internal function used by C<format>s, though you may call it,
too.  It formats (see L<perlform>) a list of values according to the
contents of PICTURE, placing the output into the format output
accumulator, C<$^A> (or C<$ACCUMULATOR> in English).
Eventually, when a C<write> is done, the contents of
C<$^A> are written to some filehandle, but you could also read C<$^A>
yourself and then set C<$^A> back to C<"">.  Note that a format typically
does one C<formline> per line of form, but the C<formline> function itself
doesn't care how many newlines are embedded in the PICTURE.  This means
that the C<~> and C<~~> tokens will treat the entire PICTURE as a single line.
You may therefore need to use multiple formlines to implement a single
record format, just like the format compiler.

Be careful if you put double quotes around the picture, because an C<@>
character may be taken to mean the beginning of an array name.
C<formline> always returns true.  See L<perlform> for other examples.

=item getc FILEHANDLE

=item getc

Returns the next character from the input file attached to FILEHANDLE,
or the undefined value at end of file, or if there was an error.
If FILEHANDLE is omitted, reads from STDIN.  This is not particularly
efficient.  However, it cannot be used by itself to fetch single
characters without waiting for the user to hit enter.  For that, try
something more like:

    if ($BSD_STYLE) {
	system "stty cbreak </dev/tty >/dev/tty 2>&1";
    }
    else {
	system "stty", '-icanon', 'eol', "\001";
    }

    $key = getc(STDIN);

    if ($BSD_STYLE) {
	system "stty -cbreak </dev/tty >/dev/tty 2>&1";
    }
    else {
	system "stty", 'icanon', 'eol', '^@'; # ASCII null
    }
    print "\n";

Determination of whether $BSD_STYLE should be set
is left as an exercise to the reader.

The C<POSIX::getattr> function can do this more portably on
systems purporting POSIX compliance.  See also the C<Term::ReadKey>
module from your nearest CPAN site; details on CPAN can be found on
L<perlmodlib/CPAN>.

=item getlogin

Implements the C library function of the same name, which on most
systems returns the current login from F</etc/utmp>, if any.  If null,
use C<getpwuid>.

    $login = getlogin || getpwuid($<) || "Kilroy";

Do not consider C<getlogin> for authentication: it is not as
secure as C<getpwuid>.

=item getpeername SOCKET

Returns the packed sockaddr address of other end of the SOCKET connection.

    use Socket;
    $hersockaddr    = getpeername(SOCK);
    ($port, $iaddr) = sockaddr_in($hersockaddr);
    $herhostname    = gethostbyaddr($iaddr, AF_INET);
    $herstraddr     = inet_ntoa($iaddr);

=item getpgrp PID

Returns the current process group for the specified PID.  Use
a PID of C<0> to get the current process group for the
current process.  Will raise an exception if used on a machine that
doesn't implement getpgrp(2).  If PID is omitted, returns process
group of current process.  Note that the POSIX version of C<getpgrp>
does not accept a PID argument, so only C<PID==0> is truly portable.

=item getppid

Returns the process id of the parent process.

=item getpriority WHICH,WHO

Returns the current priority for a process, a process group, or a user.
(See L<getpriority(2)>.)  Will raise a fatal exception if used on a
machine that doesn't implement getpriority(2).

=item getpwnam NAME

=item getgrnam NAME

=item gethostbyname NAME

=item getnetbyname NAME

=item getprotobyname NAME

=item getpwuid UID

=item getgrgid GID

=item getservbyname NAME,PROTO

=item gethostbyaddr ADDR,ADDRTYPE

=item getnetbyaddr ADDR,ADDRTYPE

=item getprotobynumber NUMBER

=item getservbyport PORT,PROTO

=item getpwent

=item getgrent

=item gethostent

=item getnetent

=item getprotoent

=item getservent

=item setpwent

=item setgrent

=item sethostent STAYOPEN

=item setnetent STAYOPEN

=item setprotoent STAYOPEN

=item setservent STAYOPEN

=item endpwent

=item endgrent

=item endhostent

=item endnetent

=item endprotoent

=item endservent

These routines perform the same functions as their counterparts in the
system library.  In list context, the return values from the
various get routines are as follows:

    ($name,$passwd,$uid,$gid,
       $quota,$comment,$gcos,$dir,$shell,$expire) = getpw*
    ($name,$passwd,$gid,$members) = getgr*
    ($name,$aliases,$addrtype,$length,@addrs) = gethost*
    ($name,$aliases,$addrtype,$net) = getnet*
    ($name,$aliases,$proto) = getproto*
    ($name,$aliases,$port,$proto) = getserv*

(If the entry doesn't exist you get a null list.)

The exact meaning of the $gcos field varies but it usually contains
the real name of the user (as opposed to the login name) and other
information pertaining to the user.  Beware, however, that in many
system users are able to change this information and therefore it
cannot be trusted and therefore the $gcos is tainted (see
L<perlsec>).  The $passwd and $shell, user's encrypted password and
login shell, are also tainted, because of the same reason.

In scalar context, you get the name, unless the function was a
lookup by name, in which case you get the other thing, whatever it is.
(If the entry doesn't exist you get the undefined value.)  For example:

    $uid   = getpwnam($name);
    $name  = getpwuid($num);
    $name  = getpwent();
    $gid   = getgrnam($name);
    $name  = getgrgid($num;
    $name  = getgrent();
    #etc.

In I<getpw*()> the fields $quota, $comment, and $expire are special
cases in the sense that in many systems they are unsupported.  If the
$quota is unsupported, it is an empty scalar.  If it is supported, it
usually encodes the disk quota.  If the $comment field is unsupported,
it is an empty scalar.  If it is supported it usually encodes some
administrative comment about the user.  In some systems the $quota
field may be $change or $age, fields that have to do with password
aging.  In some systems the $comment field may be $class.  The $expire
field, if present, encodes the expiration period of the account or the
password.  For the availability and the exact meaning of these fields
in your system, please consult your getpwnam(3) documentation and your
F<pwd.h> file.  You can also find out from within Perl what your
$quota and $comment fields mean and whether you have the $expire field
by using the C<Config> module and the values C<d_pwquota>, C<d_pwage>,
C<d_pwchange>, C<d_pwcomment>, and C<d_pwexpire>.  Shadow password
files are only supported if your vendor has implemented them in the
intuitive fashion that calling the regular C library routines gets the
shadow versions if you're running under privilege or if there exists
the shadow(3) functions as found in System V ( this includes Solaris
and Linux.)  Those systems which implement a proprietary shadow password
facility are unlikely to be supported.

The $members value returned by I<getgr*()> is a space separated list of
the login names of the members of the group.

For the I<gethost*()> functions, if the C<h_errno> variable is supported in
C, it will be returned to you via C<$?> if the function call fails.  The
C<@addrs> value returned by a successful call is a list of the raw
addresses returned by the corresponding system library call.  In the
Internet domain, each address is four bytes long and you can unpack it
by saying something like:

    ($a,$b,$c,$d) = unpack('C4',$addr[0]);

The Socket library makes this slightly easier:

    use Socket;
    $iaddr = inet_aton("127.1"); # or whatever address
    $name  = gethostbyaddr($iaddr, AF_INET);

    # or going the other way
    $straddr = inet_ntoa($iaddr);

If you get tired of remembering which element of the return list
contains which return value, by-name interfaces are provided
in standard modules: C<File::stat>, C<Net::hostent>, C<Net::netent>,
C<Net::protoent>, C<Net::servent>, C<Time::gmtime>, C<Time::localtime>,
and C<User::grent>.  These override the normal built-ins, supplying
versions that return objects with the appropriate names
for each field.  For example:

   use File::stat;
   use User::pwent;
   $is_his = (stat($filename)->uid == pwent($whoever)->uid);

Even though it looks like they're the same method calls (uid), 
they aren't, because a C<File::stat> object is different from 
a C<User::pwent> object.

=item getsockname SOCKET

Returns the packed sockaddr address of this end of the SOCKET connection,
in case you don't know the address because you have several different
IPs that the connection might have come in on.

    use Socket;
    $mysockaddr = getsockname(SOCK);
    ($port, $myaddr) = sockaddr_in($mysockaddr);
    printf "Connect to %s [%s]\n", 
       scalar gethostbyaddr($myaddr, AF_INET),
       inet_ntoa($myaddr);

=item getsockopt SOCKET,LEVEL,OPTNAME

Returns the socket option requested, or undef if there is an error.

=item glob EXPR

=item glob

Returns the value of EXPR with filename expansions such as the
standard Unix shell F</bin/csh> would do.  This is the internal function
implementing the C<< <*.c> >> operator, but you can use it directly.
If EXPR is omitted, C<$_> is used.  The C<< <*.c> >> operator is
discussed in more detail in L<perlop/"I/O Operators">.

Beginning with v5.6.0, this operator is implemented using the standard
C<File::Glob> extension.  See L<File::Glob> for details.

=item gmtime EXPR

Converts a time as returned by the time function to a 8-element list
with the time localized for the standard Greenwich time zone.
Typically used as follows:

    #  0    1    2     3     4    5     6     7  
    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday) =
					    gmtime(time);

All list elements are numeric, and come straight out of the C `struct
tm'.  $sec, $min, and $hour are the seconds, minutes, and hours of the
specified time.  $mday is the day of the month, and $mon is the month
itself, in the range C<0..11> with 0 indicating January and 11
indicating December.  $year is the number of years since 1900.  That
is, $year is C<123> in year 2023.  $wday is the day of the week, with
0 indicating Sunday and 3 indicating Wednesday.  $yday is the day of
the year, in the range C<0..364> (or C<0..365> in leap years.)  

Note that the $year element is I<not> simply the last two digits of
the year.  If you assume it is, then you create non-Y2K-compliant
programs--and you wouldn't want to do that, would you?

The proper way to get a complete 4-digit year is simply:

	$year += 1900;

And to get the last two digits of the year (e.g., '01' in 2001) do:

	$year = sprintf("%02d", $year % 100);

If EXPR is omitted, C<gmtime()> uses the current time (C<gmtime(time)>).

In scalar context, C<gmtime()> returns the ctime(3) value:

    $now_string = gmtime;  # e.g., "Thu Oct 13 04:54:34 1994"

Also see the C<timegm> function provided by the C<Time::Local> module,
and the strftime(3) function available via the POSIX module.

This scalar value is B<not> locale dependent (see L<perllocale>), but
is instead a Perl builtin.  Also see the C<Time::Local> module, and the
strftime(3) and mktime(3) functions available via the POSIX module.  To
get somewhat similar but locale dependent date strings, set up your
locale environment variables appropriately (please see L<perllocale>)
and try for example:

    use POSIX qw(strftime);
    $now_string = strftime "%a %b %e %H:%M:%S %Y", gmtime;

Note that the C<%a> and C<%b> escapes, which represent the short forms
of the day of the week and the month of the year, may not necessarily
be three characters wide in all locales.

=item goto LABEL

=item goto EXPR

=item goto &NAME

The C<goto-LABEL> form finds the statement labeled with LABEL and resumes
execution there.  It may not be used to go into any construct that
requires initialization, such as a subroutine or a C<foreach> loop.  It
also can't be used to go into a construct that is optimized away,
or to get out of a block or subroutine given to C<sort>.
It can be used to go almost anywhere else within the dynamic scope,
including out of subroutines, but it's usually better to use some other
construct such as C<last> or C<die>.  The author of Perl has never felt the
need to use this form of C<goto> (in Perl, that is--C is another matter).

The C<goto-EXPR> form expects a label name, whose scope will be resolved
dynamically.  This allows for computed C<goto>s per FORTRAN, but isn't
necessarily recommended if you're optimizing for maintainability:

    goto ("FOO", "BAR", "GLARCH")[$i];

The C<goto-&NAME> form is quite different from the other forms of C<goto>.
In fact, it isn't a goto in the normal sense at all, and doesn't have
the stigma associated with other gotos.  Instead, it
substitutes a call to the named subroutine for the currently running
subroutine.  This is used by C<AUTOLOAD> subroutines that wish to load
another subroutine and then pretend that the other subroutine had been
called in the first place (except that any modifications to C<@_>
in the current subroutine are propagated to the other subroutine.)
After the C<goto>, not even C<caller> will be able to tell that this
routine was called first.

NAME needn't be the name of a subroutine; it can be a scalar variable
containing a code reference, or a block which evaluates to a code
reference.

=item grep BLOCK LIST

=item grep EXPR,LIST

This is similar in spirit to, but not the same as, grep(1) and its
relatives.  In particular, it is not limited to using regular expressions.

Evaluates the BLOCK or EXPR for each element of LIST (locally setting
C<$_> to each element) and returns the list value consisting of those
elements for which the expression evaluated to true.  In scalar
context, returns the number of times the expression was true.

    @foo = grep(!/^#/, @bar);    # weed out comments

or equivalently,

    @foo = grep {!/^#/} @bar;    # weed out comments

Note that C<$_> is an alias to the list value, so it can be used to
modify the elements of the LIST.  While this is useful and supported,
it can cause bizarre results if the elements of LIST are not variables.
Similarly, grep returns aliases into the original list, much as a for
loop's index variable aliases the list elements.  That is, modifying an
element of a list returned by grep (for example, in a C<foreach>, C<map>
or another C<grep>) actually modifies the element in the original list.
This is usually something to be avoided when writing clear code.

See also L</map> for a list composed of the results of the BLOCK or EXPR.

=item hex EXPR

=item hex

Interprets EXPR as a hex string and returns the corresponding value.
(To convert strings that might start with either 0, 0x, or 0b, see
L</oct>.)  If EXPR is omitted, uses C<$_>.

    print hex '0xAf'; # prints '175'
    print hex 'aF';   # same

Hex strings may only represent integers.  Strings that would cause
integer overflow trigger a warning.

=item import

There is no builtin C<import> function.  It is just an ordinary
method (subroutine) defined (or inherited) by modules that wish to export
names to another module.  The C<use> function calls the C<import> method
for the package used.  See also L</use>, L<perlmod>, and L<Exporter>.

=item index STR,SUBSTR,POSITION

=item index STR,SUBSTR

The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
It returns the position of the first occurrence of SUBSTR in STR at
or after POSITION.  If POSITION is omitted, starts searching from the
beginning of the string.  The return value is based at C<0> (or whatever
you've set the C<$[> variable to--but don't do that).  If the substring
is not found, returns one less than the base, ordinarily C<-1>.

=item int EXPR

=item int

Returns the integer portion of EXPR.  If EXPR is omitted, uses C<$_>.
You should not use this function for rounding: one because it truncates
towards C<0>, and two because machine representations of floating point
numbers can sometimes produce counterintuitive results.  For example,
C<int(-6.725/0.025)> produces -268 rather than the correct -269; that's
because it's really more like -268.99999999999994315658 instead.  Usually,
the C<sprintf>, C<printf>, or the C<POSIX::floor> and C<POSIX::ceil>
functions will serve you better than will int().

=item ioctl FILEHANDLE,FUNCTION,SCALAR

Implements the ioctl(2) function.  You'll probably first have to say

    require "ioctl.ph";	# probably in /usr/local/lib/perl/ioctl.ph

to get the correct function definitions.  If F<ioctl.ph> doesn't
exist or doesn't have the correct definitions you'll have to roll your
own, based on your C header files such as F<< <sys/ioctl.h> >>.
(There is a Perl script called B<h2ph> that comes with the Perl kit that
may help you in this, but it's nontrivial.)  SCALAR will be read and/or
written depending on the FUNCTION--a pointer to the string value of SCALAR
will be passed as the third argument of the actual C<ioctl> call.  (If SCALAR
has no string value but does have a numeric value, that value will be
passed rather than a pointer to the string value.  To guarantee this to be
true, add a C<0> to the scalar before using it.)  The C<pack> and C<unpack>
functions may be needed to manipulate the values of structures used by
C<ioctl>.  

The return value of C<ioctl> (and C<fcntl>) is as follows:

	if OS returns:		then Perl returns:
	    -1	  		  undefined value
	     0	 		string "0 but true"
	anything else		    that number

Thus Perl returns true on success and false on failure, yet you can
still easily determine the actual value returned by the operating
system:

    $retval = ioctl(...) || -1;
    printf "System returned %d\n", $retval;

The special string "C<0> but true" is exempt from B<-w> complaints
about improper numeric conversions.

Here's an example of setting a filehandle named C<REMOTE> to be
non-blocking at the system level.  You'll have to negotiate C<$|>
on your own, though.

    use Fcntl qw(F_GETFL F_SETFL O_NONBLOCK);

    $flags = fcntl(REMOTE, F_GETFL, 0)
                or die "Can't get flags for the socket: $!\n";

    $flags = fcntl(REMOTE, F_SETFL, $flags | O_NONBLOCK)
                or die "Can't set flags for the socket: $!\n";

=item join EXPR,LIST

Joins the separate strings of LIST into a single string with fields
separated by the value of EXPR, and returns that new string.  Example:

    $rec = join(':', $login,$passwd,$uid,$gid,$gcos,$home,$shell);

Beware that unlike C<split>, C<join> doesn't take a pattern as its
first argument.  Compare L</split>.

=item keys HASH

Returns a list consisting of all the keys of the named hash.  (In
scalar context, returns the number of keys.)  The keys are returned in
an apparently random order.  The actual random order is subject to
change in future versions of perl, but it is guaranteed to be the same
order as either the C<values> or C<each> function produces (given
that the hash has not been modified).  As a side effect, it resets
HASH's iterator.

Here is yet another way to print your environment:

    @keys = keys %ENV;
    @values = values %ENV;
    while (@keys) { 
	print pop(@keys), '=', pop(@values), "\n";
    }

or how about sorted by key:

    foreach $key (sort(keys %ENV)) {
	print $key, '=', $ENV{$key}, "\n";
    }

The returned values are copies of the original keys in the hash, so
modifying them will not affect the original hash.  Compare L</values>.

To sort a hash by value, you'll need to use a C<sort> function.
Here's a descending numeric sort of a hash by its values:

    foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) {
	printf "%4d %s\n", $hash{$key}, $key;
    }

As an lvalue C<keys> allows you to increase the number of hash buckets
allocated for the given hash.  This can gain you a measure of efficiency if
you know the hash is going to get big.  (This is similar to pre-extending
an array by assigning a larger number to $#array.)  If you say

    keys %hash = 200;

then C<%hash> will have at least 200 buckets allocated for it--256 of them,
in fact, since it rounds up to the next power of two.  These
buckets will be retained even if you do C<%hash = ()>, use C<undef
%hash> if you want to free the storage while C<%hash> is still in scope.
You can't shrink the number of buckets allocated for the hash using
C<keys> in this way (but you needn't worry about doing this by accident,
as trying has no effect).

See also C<each>, C<values> and C<sort>.

=item kill SIGNAL, LIST

Sends a signal to a list of processes.  Returns the number of
processes successfully signaled (which is not necessarily the
same as the number actually killed).

    $cnt = kill 1, $child1, $child2;
    kill 9, @goners;

If SIGNAL is zero, no signal is sent to the process.  This is a
useful way to check that the process is alive and hasn't changed
its UID.  See L<perlport> for notes on the portability of this
construct.

Unlike in the shell, if SIGNAL is negative, it kills
process groups instead of processes.  (On System V, a negative I<PROCESS>
number will also kill process groups, but that's not portable.)  That
means you usually want to use positive not negative signals.  You may also
use a signal name in quotes.  See L<perlipc/"Signals"> for details.

=item last LABEL

=item last

The C<last> command is like the C<break> statement in C (as used in
loops); it immediately exits the loop in question.  If the LABEL is
omitted, the command refers to the innermost enclosing loop.  The
C<continue> block, if any, is not executed:

    LINE: while (<STDIN>) {
	last LINE if /^$/;	# exit when done with header
	#...
    }

C<last> cannot be used to exit a block which returns a value such as
C<eval {}>, C<sub {}> or C<do {}>, and should not be used to exit
a grep() or map() operation.

Note that a block by itself is semantically identical to a loop
that executes once.  Thus C<last> can be used to effect an early
exit out of such a block.

See also L</continue> for an illustration of how C<last>, C<next>, and
C<redo> work.

=item lc EXPR

=item lc

Returns an lowercased version of EXPR.  This is the internal function
implementing the C<\L> escape in double-quoted strings.
Respects current LC_CTYPE locale if C<use locale> in force.  See L<perllocale>
and L<utf8>.

If EXPR is omitted, uses C<$_>.

=item lcfirst EXPR

=item lcfirst

Returns the value of EXPR with the first character lowercased.  This is
the internal function implementing the C<\l> escape in double-quoted strings.
Respects current LC_CTYPE locale if C<use locale> in force.  See L<perllocale>.

If EXPR is omitted, uses C<$_>.

=item length EXPR

=item length

Returns the length in characters of the value of EXPR.  If EXPR is
omitted, returns length of C<$_>.  Note that this cannot be used on 
an entire array or hash to find out how many elements these have.
For that, use C<scalar @array> and C<scalar keys %hash> respectively.

=item link OLDFILE,NEWFILE

Creates a new filename linked to the old filename.  Returns true for
success, false otherwise. 

=item listen SOCKET,QUEUESIZE

Does the same thing that the listen system call does.  Returns true if
it succeeded, false otherwise.  See the example in 
L<perlipc/"Sockets: Client/Server Communication">.

=item local EXPR

You really probably want to be using C<my> instead, because C<local> isn't
what most people think of as "local".  See 
L<perlsub/"Private Variables via my()"> for details.

A local modifies the listed variables to be local to the enclosing
block, file, or eval.  If more than one value is listed, the list must
be placed in parentheses.  See L<perlsub/"Temporary Values via local()">
for details, including issues with tied arrays and hashes.

=item localtime EXPR

Converts a time as returned by the time function to a 9-element list
with the time analyzed for the local time zone.  Typically used as
follows:

    #  0    1    2     3     4    5     6     7     8
    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
						localtime(time);

All list elements are numeric, and come straight out of the C `struct
tm'.  $sec, $min, and $hour are the seconds, minutes, and hours of the
specified time.  $mday is the day of the month, and $mon is the month
itself, in the range C<0..11> with 0 indicating January and 11
indicating December.  $year is the number of years since 1900.  That
is, $year is C<123> in year 2023.  $wday is the day of the week, with
0 indicating Sunday and 3 indicating Wednesday.  $yday is the day of
the year, in the range C<0..364> (or C<0..365> in leap years.)  $isdst
is true if the specified time occurs during daylight savings time,
false otherwise.

Note that the $year element is I<not> simply the last two digits of
the year.  If you assume it is, then you create non-Y2K-compliant
programs--and you wouldn't want to do that, would you?

The proper way to get a complete 4-digit year is simply:

	$year += 1900;

And to get the last two digits of the year (e.g., '01' in 2001) do:

	$year = sprintf("%02d", $year % 100);

If EXPR is omitted, C<localtime()> uses the current time (C<localtime(time)>).

In scalar context, C<localtime()> returns the ctime(3) value:

    $now_string = localtime;  # e.g., "Thu Oct 13 04:54:34 1994"

This scalar value is B<not> locale dependent, see L<perllocale>, but
instead a Perl builtin.  Also see the C<Time::Local> module
(to convert the second, minutes, hours, ... back to seconds since the
stroke of midnight the 1st of January 1970, the value returned by
time()), and the strftime(3) and mktime(3) functions available via the
POSIX module.  To get somewhat similar but locale dependent date
strings, set up your locale environment variables appropriately
(please see L<perllocale>) and try for example:

    use POSIX qw(strftime);
    $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;

Note that the C<%a> and C<%b>, the short forms of the day of the week
and the month of the year, may not necessarily be three characters wide.

=item lock

    lock I<THING>

This function places an advisory lock on a variable, subroutine,
or referenced object contained in I<THING> until the lock goes out
of scope.  This is a built-in function only if your version of Perl
was built with threading enabled, and if you've said C<use Threads>.
Otherwise a user-defined function by this name will be called.  See
L<Thread>.

=item log EXPR

=item log

Returns the natural logarithm (base I<e>) of EXPR.  If EXPR is omitted,
returns log of C<$_>.  To get the log of another base, use basic algebra:
The base-N log of a number is equal to the natural log of that number
divided by the natural log of N.  For example:

    sub log10 {
	my $n = shift;
	return log($n)/log(10);
    } 

See also L</exp> for the inverse operation.

=item lstat FILEHANDLE

=item lstat EXPR

=item lstat

Does the same thing as the C<stat> function (including setting the
special C<_> filehandle) but stats a symbolic link instead of the file
the symbolic link points to.  If symbolic links are unimplemented on
your system, a normal C<stat> is done.

If EXPR is omitted, stats C<$_>.

=item m//

The match operator.  See L<perlop>.

=item map BLOCK LIST

=item map EXPR,LIST

Evaluates the BLOCK or EXPR for each element of LIST (locally setting
C<$_> to each element) and returns the list value composed of the
results of each such evaluation.  In scalar context, returns the
total number of elements so generated.  Evaluates BLOCK or EXPR in
list context, so each element of LIST may produce zero, one, or
more elements in the returned value.

    @chars = map(chr, @nums);

translates a list of numbers to the corresponding characters.  And

    %hash = map { getkey($_) => $_ } @array;

is just a funny way to write

    %hash = ();
    foreach $_ (@array) {
	$hash{getkey($_)} = $_;
    }

Note that C<$_> is an alias to the list value, so it can be used to
modify the elements of the LIST.  While this is useful and supported,
it can cause bizarre results if the elements of LIST are not variables.
Using a regular C<foreach> loop for this purpose would be clearer in
most cases.  See also L</grep> for an array composed of those items of
the original list for which the BLOCK or EXPR evaluates to true.

C<{> starts both hash references and blocks, so C<map { ...> could be either
the start of map BLOCK LIST or map EXPR, LIST. Because perl doesn't look
ahead for the closing C<}> it has to take a guess at which its dealing with
based what it finds just after the C<{>. Usually it gets it right, but if it
doesn't it won't realize something is wrong until it gets to the C<}> and
encounters the missing (or unexpected) comma. The syntax error will be
reported close to the C<}> but you'll need to change something near the C<{>
such as using a unary C<+> to give perl some help:

    %hash = map {  "\L$_", 1  } @array  # perl guesses EXPR.  wrong
    %hash = map { +"\L$_", 1  } @array  # perl guesses BLOCK. right
    %hash = map { ("\L$_", 1) } @array  # this also works
    %hash = map {  lc($_), 1  } @array  # as does this.
    %hash = map +( lc($_), 1 ), @array  # this is EXPR and works!

    %hash = map  ( lc($_), 1 ), @array  # evaluates to (1, @array)

or to force an anon hash constructor use C<+{>

   @hashes = map +{ lc($_), 1 }, @array # EXPR, so needs , at end

and you get list of anonymous hashes each with only 1 entry.

=item mkdir FILENAME,MASK

=item mkdir FILENAME

Creates the directory specified by FILENAME, with permissions
specified by MASK (as modified by C<umask>).  If it succeeds it
returns true, otherwise it returns false and sets C<$!> (errno).
If omitted, MASK defaults to 0777.

In general, it is better to create directories with permissive MASK,
and let the user modify that with their C<umask>, than it is to supply
a restrictive MASK and give the user no way to be more permissive.
The exceptions to this rule are when the file or directory should be
kept private (mail files, for instance).  The perlfunc(1) entry on
C<umask> discusses the choice of MASK in more detail.

=item msgctl ID,CMD,ARG

Calls the System V IPC function msgctl(2).  You'll probably have to say

    use IPC::SysV;

first to get the correct constant definitions.  If CMD is C<IPC_STAT>,
then ARG must be a variable which will hold the returned C<msqid_ds>
structure.  Returns like C<ioctl>: the undefined value for error,
C<"0 but true"> for zero, or the actual return value otherwise.  See also
L<perlipc/"SysV IPC">, C<IPC::SysV>, and C<IPC::Semaphore> documentation.

=item msgget KEY,FLAGS

Calls the System V IPC function msgget(2).  Returns the message queue
id, or the undefined value if there is an error.  See also
L<perlipc/"SysV IPC"> and C<IPC::SysV> and C<IPC::Msg> documentation.

=item msgrcv ID,VAR,SIZE,TYPE,FLAGS

Calls the System V IPC function msgrcv to receive a message from
message queue ID into variable VAR with a maximum message size of
SIZE.  Note that when a message is received, the message type as a
native long integer will be the first thing in VAR, followed by the
actual message.  This packing may be opened with C<unpack("l! a*")>.
Taints the variable.  Returns true if successful, or false if there is
an error.  See also L<perlipc/"SysV IPC">, C<IPC::SysV>, and
C<IPC::SysV::Msg> documentation.

=item msgsnd ID,MSG,FLAGS

Calls the System V IPC function msgsnd to send the message MSG to the
message queue ID.  MSG must begin with the native long integer message
type, and be followed by the length of the actual message, and finally
the message itself.  This kind of packing can be achieved with
C<pack("l! a*", $type, $message)>.  Returns true if successful,
or false if there is an error.  See also C<IPC::SysV>
and C<IPC::SysV::Msg> documentation.

=item my EXPR

=item my EXPR : ATTRIBUTES

A C<my> declares the listed variables to be local (lexically) to the
enclosing block, file, or C<eval>.  If
more than one value is listed, the list must be placed in parentheses.  See
L<perlsub/"Private Variables via my()"> for details.

=item next LABEL

=item next

The C<next> command is like the C<continue> statement in C; it starts
the next iteration of the loop:

    LINE: while (<STDIN>) {
	next LINE if /^#/;	# discard comments
	#...
    }

Note that if there were a C<continue> block on the above, it would get
executed even on discarded lines.  If the LABEL is omitted, the command
refers to the innermost enclosing loop.

C<next> cannot be used to exit a block which returns a value such as
C<eval {}>, C<sub {}> or C<do {}>, and should not be used to exit
a grep() or map() operation.

Note that a block by itself is semantically identical to a loop
that executes once.  Thus C<next> will exit such a block early.

See also L</continue> for an illustration of how C<last>, C<next>, and
C<redo> work.

=item no Module LIST

See the L</use> function, which C<no> is the opposite of.

=item oct EXPR

=item oct

Interprets EXPR as an octal string and returns the corresponding
value.  (If EXPR happens to start off with C<0x>, interprets it as a
hex string.  If EXPR starts off with C<0b>, it is interpreted as a
binary string.)  The following will handle decimal, binary, octal, and
hex in the standard Perl or C notation:

    $val = oct($val) if $val =~ /^0/;

If EXPR is omitted, uses C<$_>.   To go the other way (produce a number
in octal), use sprintf() or printf():

    $perms = (stat("filename"))[2] & 07777;
    $oct_perms = sprintf "%lo", $perms;

The oct() function is commonly used when a string such as C<644> needs
to be converted into a file mode, for example. (Although perl will
automatically convert strings into numbers as needed, this automatic
conversion assumes base 10.)

=item open FILEHANDLE,MODE,LIST

=item open FILEHANDLE,EXPR

=item open FILEHANDLE

Opens the file whose filename is given by EXPR, and associates it with
FILEHANDLE.  If FILEHANDLE is an expression, its value is used as the
name of the real filehandle wanted.  (This is considered a symbolic
reference, so C<use strict 'refs'> should I<not> be in effect.)

If EXPR is omitted, the scalar
variable of the same name as the FILEHANDLE contains the filename.
(Note that lexical variables--those declared with C<my>--will not work
for this purpose; so if you're using C<my>, specify EXPR in your call
to open.)  See L<perlopentut> for a kinder, gentler explanation of opening
files.

If MODE is C<< '<' >> or nothing, the file is opened for input.
If MODE is C<< '>' >>, the file is truncated and opened for
output, being created if necessary.  If MODE is C<<< '>>' >>>,
the file is opened for appending, again being created if necessary. 
You can put a C<'+'> in front of the C<< '>' >> or C<< '<' >> to indicate that
you want both read and write access to the file; thus C<< '+<' >> is almost
always preferred for read/write updates--the C<< '+>' >> mode would clobber the
file first.  You can't usually use either read-write mode for updating
textfiles, since they have variable length records.  See the B<-i>
switch in L<perlrun> for a better approach.  The file is created with
permissions of C<0666> modified by the process' C<umask> value.

These various prefixes correspond to the fopen(3) modes of C<'r'>, C<'r+'>,
C<'w'>, C<'w+'>, C<'a'>, and C<'a+'>.

In the 2-arguments (and 1-argument) form of the call the mode and
filename should be concatenated (in this order), possibly separated by
spaces.  It is possible to omit the mode if the mode is C<< '<' >>.

If the filename begins with C<'|'>, the filename is interpreted as a
command to which output is to be piped, and if the filename ends with a
C<'|'>, the filename is interpreted as a command which pipes output to
us.  See L<perlipc/"Using open() for IPC">
for more examples of this.  (You are not allowed to C<open> to a command
that pipes both in I<and> out, but see L<IPC::Open2>, L<IPC::Open3>,
and L<perlipc/"Bidirectional Communication with Another Process">
for alternatives.)

If MODE is C<'|-'>, the filename is interpreted as a
command to which output is to be piped, and if MODE is
C<'-|'>, the filename is interpreted as a command which pipes output to
us.  In the 2-arguments (and 1-argument) form one should replace dash
(C<'-'>) with the command.  See L<perlipc/"Using open() for IPC">
for more examples of this.  (You are not allowed to C<open> to a command
that pipes both in I<and> out, but see L<IPC::Open2>, L<IPC::Open3>,
and L<perlipc/"Bidirectional Communication"> for alternatives.)

In the 2-arguments (and 1-argument) form opening C<'-'> opens STDIN
and opening C<< '>-' >> opens STDOUT.  

Open returns
nonzero upon success, the undefined value otherwise.  If the C<open>
involved a pipe, the return value happens to be the pid of the
subprocess.

If you're unfortunate enough to be running Perl on a system that
distinguishes between text files and binary files (modern operating
systems don't care), then you should check out L</binmode> for tips for
dealing with this.  The key distinction between systems that need C<binmode>
and those that don't is their text file formats.  Systems like Unix, MacOS, and
Plan9, which delimit lines with a single character, and which encode that
character in C as C<"\n">, do not need C<binmode>.  The rest need it.

When opening a file, it's usually a bad idea to continue normal execution
if the request failed, so C<open> is frequently used in connection with
C<die>.  Even if C<die> won't do what you want (say, in a CGI script,
where you want to make a nicely formatted error message (but there are
modules that can help with that problem)) you should always check
the return value from opening a file.  The infrequent exception is when
working with an unopened filehandle is actually what you want to do.

Examples:

    $ARTICLE = 100;
    open ARTICLE or die "Can't find article $ARTICLE: $!\n";
    while (<ARTICLE>) {...

    open(LOG, '>>/usr/spool/news/twitlog');	# (log is reserved)
    # if the open fails, output is discarded

    open(DBASE, '+<', 'dbase.mine')		# open for update
	or die "Can't open 'dbase.mine' for update: $!";

    open(DBASE, '+<dbase.mine')			# ditto
	or die "Can't open 'dbase.mine' for update: $!";

    open(ARTICLE, '-|', "caesar <$article")     # decrypt article
	or die "Can't start caesar: $!";

    open(ARTICLE, "caesar <$article |")		# ditto
	or die "Can't start caesar: $!";

    open(EXTRACT, "|sort >/tmp/Tmp$$")		# $$ is our process id
	or die "Can't start sort: $!";

    # process argument list of files along with any includes

    foreach $file (@ARGV) {
	process($file, 'fh00');
    }

    sub process {
	my($filename, $input) = @_;
	$input++;		# this is a string increment
	unless (open($input, $filename)) {
	    print STDERR "Can't open $filename: $!\n";
	    return;
	}

	local $_;
	while (<$input>) {		# note use of indirection
	    if (/^#include "(.*)"/) {
		process($1, $input);
		next;
	    }
	    #...		# whatever
	}
    }

You may also, in the Bourne shell tradition, specify an EXPR beginning
with C<< '>&' >>, in which case the rest of the string is interpreted as the
name of a filehandle (or file descriptor, if numeric) to be
duped and opened.  You may use C<&> after C<< > >>, C<<< >> >>>,
C<< < >>, C<< +> >>, C<<< +>> >>>, and C<< +< >>.  The
mode you specify should match the mode of the original filehandle.
(Duping a filehandle does not take into account any existing contents of
stdio buffers.)  Duping file handles is not yet supported for 3-argument
open().

Here is a script that saves, redirects, and restores STDOUT and
STDERR:

    #!/usr/bin/perl
    open(OLDOUT, ">&STDOUT");
    open(OLDERR, ">&STDERR");

    open(STDOUT, '>', "foo.out") || die "Can't redirect stdout";
    open(STDERR, ">&STDOUT")     || die "Can't dup stdout";

    select(STDERR); $| = 1;	# make unbuffered
    select(STDOUT); $| = 1;	# make unbuffered

    print STDOUT "stdout 1\n";	# this works for
    print STDERR "stderr 1\n"; 	# subprocesses too

    close(STDOUT);
    close(STDERR);

    open(STDOUT, ">&OLDOUT");
    open(STDERR, ">&OLDERR");

    print STDOUT "stdout 2\n";
    print STDERR "stderr 2\n";

If you specify C<< '<&=N' >>, where C<N> is a number, then Perl will do an
equivalent of C's C<fdopen> of that file descriptor; this is more
parsimonious of file descriptors.  For example:

    open(FILEHANDLE, "<&=$fd")

Note that this feature depends on the fdopen() C library function.
On many UNIX systems, fdopen() is known to fail when file descriptors
exceed a certain value, typically 255. If you need more file
descriptors than that, consider rebuilding Perl to use the C<sfio>
library.

If you open a pipe on the command C<'-'>, i.e., either C<'|-'> or C<'-|'>
with 2-arguments (or 1-argument) form of open(), then
there is an implicit fork done, and the return value of open is the pid
of the child within the parent process, and C<0> within the child
process.  (Use C<defined($pid)> to determine whether the open was successful.)
The filehandle behaves normally for the parent, but i/o to that
filehandle is piped from/to the STDOUT/STDIN of the child process.
In the child process the filehandle isn't opened--i/o happens from/to
the new STDOUT or STDIN.  Typically this is used like the normal
piped open when you want to exercise more control over just how the
pipe command gets executed, such as when you are running setuid, and
don't want to have to scan shell commands for metacharacters.
The following triples are more or less equivalent:

    open(FOO, "|tr '[a-z]' '[A-Z]'");
    open(FOO, '|-', "tr '[a-z]' '[A-Z]'");
    open(FOO, '|-') || exec 'tr', '[a-z]', '[A-Z]';

    open(FOO, "cat -n '$file'|");
    open(FOO, '-|', "cat -n '$file'");
    open(FOO, '-|') || exec 'cat', '-n', $file;

See L<perlipc/"Safe Pipe Opens"> for more examples of this.

Beginning with v5.6.0, Perl will attempt to flush all files opened for
output before any operation that may do a fork, but this may not be
supported on some platforms (see L<perlport>).  To be safe, you may need
to set C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method
of C<IO::Handle> on any open handles.

On systems that support a
close-on-exec flag on files, the flag will be set for the newly opened
file descriptor as determined by the value of $^F.  See L<perlvar/$^F>.

Closing any piped filehandle causes the parent process to wait for the
child to finish, and returns the status value in C<$?>.

The filename passed to 2-argument (or 1-argument) form of open()
will have leading and trailing
whitespace deleted, and the normal redirection characters
honored.  This property, known as "magic open", 
can often be used to good effect.  A user could specify a filename of
F<"rsh cat file |">, or you could change certain filenames as needed:

    $filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
    open(FH, $filename) or die "Can't open $filename: $!";

Use 3-argument form to open a file with arbitrary weird characters in it,

    open(FOO, '<', $file);

otherwise it's necessary to protect any leading and trailing whitespace:

    $file =~ s#^(\s)#./$1#;
    open(FOO, "< $file\0");

(this may not work on some bizarre filesystems).  One should
conscientiously choose between the I<magic> and 3-arguments form
of open():

    open IN, $ARGV[0];

will allow the user to specify an argument of the form C<"rsh cat file |">,
but will not work on a filename which happens to have a trailing space, while

    open IN, '<', $ARGV[0];

will have exactly the opposite restrictions.

If you want a "real" C C<open> (see L<open(2)> on your system), then you
should use the C<sysopen> function, which involves no such magic (but
may use subtly different filemodes than Perl open(), which is mapped
to C fopen()).  This is
another way to protect your filenames from interpretation.  For example:

    use IO::Handle;
    sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL)
	or die "sysopen $path: $!";
    $oldfh = select(HANDLE); $| = 1; select($oldfh);
    print HANDLE "stuff $$\n";
    seek(HANDLE, 0, 0);
    print "File contains: ", <HANDLE>;

Using the constructor from the C<IO::Handle> package (or one of its
subclasses, such as C<IO::File> or C<IO::Socket>), you can generate anonymous
filehandles that have the scope of whatever variables hold references to
them, and automatically close whenever and however you leave that scope:

    use IO::File;
    #...
    sub read_myfile_munged {
	my $ALL = shift;
	my $handle = new IO::File;
	open($handle, "myfile") or die "myfile: $!";
	$first = <$handle>
	    or return ();     # Automatically closed here.
	mung $first or die "mung failed";	# Or here.
	return $first, <$handle> if $ALL;	# Or here.
	$first;					# Or here.
    }

See L</seek> for some details about mixing reading and writing.

=item opendir DIRHANDLE,EXPR

Opens a directory named EXPR for processing by C<readdir>, C<telldir>,
C<seekdir>, C<rewinddir>, and C<closedir>.  Returns true if successful.
DIRHANDLEs have their own namespace separate from FILEHANDLEs.

=item ord EXPR

=item ord

Returns the numeric (ASCII or Unicode) value of the first character of EXPR.  If
EXPR is omitted, uses C<$_>.  For the reverse, see L</chr>.
See L<utf8> for more about Unicode.

=item our EXPR

An C<our> declares the listed variables to be valid globals within
the enclosing block, file, or C<eval>.  That is, it has the same
scoping rules as a "my" declaration, but does not create a local
variable.  If more than one value is listed, the list must be placed
in parentheses.  The C<our> declaration has no semantic effect unless
"use strict vars" is in effect, in which case it lets you use the
declared global variable without qualifying it with a package name.
(But only within the lexical scope of the C<our> declaration.  In this
it differs from "use vars", which is package scoped.)

An C<our> declaration declares a global variable that will be visible
across its entire lexical scope, even across package boundaries.  The
package in which the variable is entered is determined at the point
of the declaration, not at the point of use.  This means the following
behavior holds:

    package Foo;
    our $bar;		# declares $Foo::bar for rest of lexical scope
    $bar = 20;

    package Bar;
    print $bar;		# prints 20

Multiple C<our> declarations in the same lexical scope are allowed
if they are in different packages.  If they happened to be in the same
package, Perl will emit warnings if you have asked for them.

    use warnings;
    package Foo;
    our $bar;		# declares $Foo::bar for rest of lexical scope
    $bar = 20;

    package Bar;
    our $bar = 30;	# declares $Bar::bar for rest of lexical scope
    print $bar;		# prints 30

    our $bar;		# emits warning

=item pack TEMPLATE,LIST

Takes a LIST of values and converts it into a string using the rules
given by the TEMPLATE.  The resulting string is the concatenation of
the converted values.  Typically, each converted value looks
like its machine-level representation.  For example, on 32-bit machines
a converted integer may be represented by a sequence of 4 bytes.

The TEMPLATE is a
sequence of characters that give the order and type of values, as
follows:

    a	A string with arbitrary binary data, will be null padded.
    A	An ASCII string, will be space padded.
    Z	A null terminated (asciz) string, will be null padded.

    b	A bit string (ascending bit order inside each byte, like vec()).
    B	A bit string (descending bit order inside each byte).
    h	A hex string (low nybble first).
    H	A hex string (high nybble first).

    c	A signed char value.
    C	An unsigned char value.  Only does bytes.  See U for Unicode.

    s	A signed short value.
    S	An unsigned short value.
	  (This 'short' is _exactly_ 16 bits, which may differ from
	   what a local C compiler calls 'short'.  If you want
	   native-length shorts, use the '!' suffix.)

    i	A signed integer value.
    I	An unsigned integer value.
	  (This 'integer' is _at_least_ 32 bits wide.  Its exact
           size depends on what a local C compiler calls 'int',
           and may even be larger than the 'long' described in
           the next item.)

    l	A signed long value.
    L	An unsigned long value.
	  (This 'long' is _exactly_ 32 bits, which may differ from
	   what a local C compiler calls 'long'.  If you want
	   native-length longs, use the '!' suffix.)

    n	An unsigned short in "network" (big-endian) order.
    N	An unsigned long in "network" (big-endian) order.
    v	An unsigned short in "VAX" (little-endian) order.
    V	An unsigned long in "VAX" (little-endian) order.
	  (These 'shorts' and 'longs' are _exactly_ 16 bits and
	   _exactly_ 32 bits, respectively.)

    q	A signed quad (64-bit) value.
    Q	An unsigned quad value.
	  (Quads are available only if your system supports 64-bit
	   integer values _and_ if Perl has been compiled to support those.
           Causes a fatal error otherwise.)

    f	A single-precision float in the native format.
    d	A double-precision float in the native format.

    p	A pointer to a null-terminated string.
    P	A pointer to a structure (fixed-length string).

    u	A uuencoded string.
    U	A Unicode character number.  Encodes to UTF-8 internally.
	Works even if C<use utf8> is not in effect.

    w	A BER compressed integer.  Its bytes represent an unsigned
	integer in base 128, most significant digit first, with as
        few digits as possible.  Bit eight (the high bit) is set
        on each byte except the last.

    x	A null byte.
    X	Back up a byte.
    @	Null fill to absolute position.

The following rules apply:

=over 8

=item *

Each letter may optionally be followed by a number giving a repeat
count.  With all types except C<a>, C<A>, C<Z>, C<b>, C<B>, C<h>,
C<H>, and C<P> the pack function will gobble up that many values from
the LIST.  A C<*> for the repeat count means to use however many items are
left, except for C<@>, C<x>, C<X>, where it is equivalent
to C<0>, and C<u>, where it is equivalent to 1 (or 45, what is the
same).

When used with C<Z>, C<*> results in the addition of a trailing null
byte (so the packed result will be one longer than the byte C<length>
of the item).

The repeat count for C<u> is interpreted as the maximal number of bytes
to encode per line of output, with 0 and 1 replaced by 45.

=item *

The C<a>, C<A>, and C<Z> types gobble just one value, but pack it as a
string of length count, padding with nulls or spaces as necessary.  When
unpacking, C<A> strips trailing spaces and nulls, C<Z> strips everything
after the first null, and C<a> returns data verbatim.  When packing,
C<a>, and C<Z> are equivalent.

If the value-to-pack is too long, it is truncated.  If too long and an
explicit count is provided, C<Z> packs only C<$count-1> bytes, followed
by a null byte.  Thus C<Z> always packs a trailing null byte under
all circumstances.

=item *

Likewise, the C<b> and C<B> fields pack a string that many bits long.
Each byte of the input field of pack() generates 1 bit of the result.
Each result bit is based on the least-significant bit of the corresponding
input byte, i.e., on C<ord($byte)%2>.  In particular, bytes C<"0"> and
C<"1"> generate bits 0 and 1, as do bytes C<"\0"> and C<"\1">.

Starting from the beginning of the input string of pack(), each 8-tuple
of bytes is converted to 1 byte of output.  With format C<b>
the first byte of the 8-tuple determines the least-significant bit of a
byte, and with format C<B> it determines the most-significant bit of
a byte.

If the length of the input string is not exactly divisible by 8, the
remainder is packed as if the input string were padded by null bytes
at the end.  Similarly, during unpack()ing the "extra" bits are ignored.

If the input string of pack() is longer than needed, extra bytes are ignored.
A C<*> for the repeat count of pack() means to use all the bytes of
the input field.  On unpack()ing the bits are converted to a string
of C<"0">s and C<"1">s.

=item *

The C<h> and C<H> fields pack a string that many nybbles (4-bit groups,
representable as hexadecimal digits, 0-9a-f) long.

Each byte of the input field of pack() generates 4 bits of the result.
For non-alphabetical bytes the result is based on the 4 least-significant
bits of the input byte, i.e., on C<ord($byte)%16>.  In particular,
bytes C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes
C<"\0"> and C<"\1">.  For bytes C<"a".."f"> and C<"A".."F"> the result
is compatible with the usual hexadecimal digits, so that C<"a"> and
C<"A"> both generate the nybble C<0xa==10>.  The result for bytes
C<"g".."z"> and C<"G".."Z"> is not well-defined.

Starting from the beginning of the input string of pack(), each pair
of bytes is converted to 1 byte of output.  With format C<h> the
first byte of the pair determines the least-significant nybble of the
output byte, and with format C<H> it determines the most-significant
nybble.

If the length of the input string is not even, it behaves as if padded
by a null byte at the end.  Similarly, during unpack()ing the "extra"
nybbles are ignored.

If the input string of pack() is longer than needed, extra bytes are ignored.
A C<*> for the repeat count of pack() means to use all the bytes of
the input field.  On unpack()ing the bits are converted to a string
of hexadecimal digits.

=item *

The C<p> type packs a pointer to a null-terminated string.  You are
responsible for ensuring the string is not a temporary value (which can
potentially get deallocated before you get around to using the packed result).
The C<P> type packs a pointer to a structure of the size indicated by the
length.  A NULL pointer is created if the corresponding value for C<p> or
C<P> is C<undef>, similarly for unpack().

=item *

The C</> template character allows packing and unpacking of strings where
the packed structure contains a byte count followed by the string itself.
You write I<length-item>C</>I<string-item>.

The I<length-item> can be any C<pack> template letter,
and describes how the length value is packed.
The ones likely to be of most use are integer-packing ones like
C<n> (for Java strings), C<w> (for ASN.1 or SNMP)
and C<N> (for Sun XDR).

The I<string-item> must, at present, be C<"A*">, C<"a*"> or C<"Z*">.
For C<unpack> the length of the string is obtained from the I<length-item>,
but if you put in the '*' it will be ignored.

    unpack 'C/a', "\04Gurusamy";        gives 'Guru'
    unpack 'a3/A* A*', '007 Bond  J ';  gives (' Bond','J')
    pack 'n/a* w/a*','hello,','world';  gives "\000\006hello,\005world"

The I<length-item> is not returned explicitly from C<unpack>.

Adding a count to the I<length-item> letter is unlikely to do anything
useful, unless that letter is C<A>, C<a> or C<Z>.  Packing with a
I<length-item> of C<a> or C<Z> may introduce C<"\000"> characters,
which Perl does not regard as legal in numeric strings.

=item *

The integer types C<s>, C<S>, C<l>, and C<L> may be
immediately followed by a C<!> suffix to signify native shorts or
longs--as you can see from above for example a bare C<l> does mean
exactly 32 bits, the native C<long> (as seen by the local C compiler)
may be larger.  This is an issue mainly in 64-bit platforms.  You can
see whether using C<!> makes any difference by

	print length(pack("s")), " ", length(pack("s!")), "\n";
	print length(pack("l")), " ", length(pack("l!")), "\n";

C<i!> and C<I!> also work but only because of completeness;
they are identical to C<i> and C<I>.

The actual sizes (in bytes) of native shorts, ints, longs, and long
longs on the platform where Perl was built are also available via
L<Config>:

       use Config;
       print $Config{shortsize},    "\n";
       print $Config{intsize},      "\n";
       print $Config{longsize},     "\n";
       print $Config{longlongsize}, "\n";

(The C<$Config{longlongsize}> will be undefine if your system does
not support long longs.) 

=item *

The integer formats C<s>, C<S>, C<i>, C<I>, C<l>, and C<L>
are inherently non-portable between processors and operating systems
because they obey the native byteorder and endianness.  For example a
4-byte integer 0x12345678 (305419896 decimal) be ordered natively
(arranged in and handled by the CPU registers) into bytes as

 	0x12 0x34 0x56 0x78	# big-endian
 	0x78 0x56 0x34 0x12	# little-endian

Basically, the Intel and VAX CPUs are little-endian, while everybody
else, for example Motorola m68k/88k, PPC, Sparc, HP PA, Power, and
Cray are big-endian.  Alpha and MIPS can be either: Digital/Compaq
used/uses them in little-endian mode; SGI/Cray uses them in big-endian mode.

The names `big-endian' and `little-endian' are comic references to
the classic "Gulliver's Travels" (via the paper "On Holy Wars and a
Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980) and
the egg-eating habits of the Lilliputians.

Some systems may have even weirder byte orders such as

 	0x56 0x78 0x12 0x34
 	0x34 0x12 0x78 0x56

You can see your system's preference with

 	print join(" ", map { sprintf "%#02x", $_ }
                            unpack("C*",pack("L",0x12345678))), "\n";

The byteorder on the platform where Perl was built is also available
via L<Config>:

	use Config;
	print $Config{byteorder}, "\n";

Byteorders C<'1234'> and C<'12345678'> are little-endian, C<'4321'>
and C<'87654321'> are big-endian.

If you want portable packed integers use the formats C<n>, C<N>,
C<v>, and C<V>, their byte endianness and size is known.
See also L<perlport>.

=item *

Real numbers (floats and doubles) are in the native machine format only;
due to the multiplicity of floating formats around, and the lack of a
standard "network" representation, no facility for interchange has been
made.  This means that packed floating point data written on one machine
may not be readable on another - even if both use IEEE floating point
arithmetic (as the endian-ness of the memory representation is not part
of the IEEE spec).  See also L<perlport>.

Note that Perl uses doubles internally for all numeric calculation, and
converting from double into float and thence back to double again will
lose precision (i.e., C<unpack("f", pack("f", $foo)>) will not in general
equal $foo).

=item *

If the pattern begins with a C<U>, the resulting string will be treated
as Unicode-encoded. You can force UTF8 encoding on in a string with an
initial C<U0>, and the bytes that follow will be interpreted as Unicode
characters. If you don't want this to happen, you can begin your pattern
with C<C0> (or anything else) to force Perl not to UTF8 encode your
string, and then follow this with a C<U*> somewhere in your pattern.

=item *

You must yourself do any alignment or padding by inserting for example
enough C<'x'>es while packing.  There is no way to pack() and unpack()
could know where the bytes are going to or coming from.  Therefore
C<pack> (and C<unpack>) handle their output and input as flat
sequences of bytes.

=item *

A comment in a TEMPLATE starts with C<#> and goes to the end of line.

=item *

If TEMPLATE requires more arguments to pack() than actually given, pack()
assumes additional C<""> arguments.  If TEMPLATE requires less arguments
to pack() than actually given, extra arguments are ignored.

=back

Examples:

    $foo = pack("CCCC",65,66,67,68);
    # foo eq "ABCD"
    $foo = pack("C4",65,66,67,68);
    # same thing
    $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
    # same thing with Unicode circled letters

    $foo = pack("ccxxcc",65,66,67,68);
    # foo eq "AB\0\0CD"

    # note: the above examples featuring "C" and "c" are true
    # only on ASCII and ASCII-derived systems such as ISO Latin 1
    # and UTF-8.  In EBCDIC the first example would be
    # $foo = pack("CCCC",193,194,195,196);

    $foo = pack("s2",1,2);
    # "\1\0\2\0" on little-endian
    # "\0\1\0\2" on big-endian

    $foo = pack("a4","abcd","x","y","z");
    # "abcd"

    $foo = pack("aaaa","abcd","x","y","z");
    # "axyz"

    $foo = pack("a14","abcdefg");
    # "abcdefg\0\0\0\0\0\0\0"

    $foo = pack("i9pl", gmtime);
    # a real struct tm (on my system anyway)

    $utmp_template = "Z8 Z8 Z16 L";
    $utmp = pack($utmp_template, @utmp1);
    # a struct utmp (BSDish)

    @utmp2 = unpack($utmp_template, $utmp);
    # "@utmp1" eq "@utmp2"

    sub bintodec {
	unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
    }

    $foo = pack('sx2l', 12, 34);
    # short 12, two zero bytes padding, long 34
    $bar = pack('s@4l', 12, 34);
    # short 12, zero fill to position 4, long 34
    # $foo eq $bar

The same template may generally also be used in unpack().

=item package NAMESPACE

=item package 

Declares the compilation unit as being in the given namespace.  The scope
of the package declaration is from the declaration itself through the end
of the enclosing block, file, or eval (the same as the C<my> operator).
All further unqualified dynamic identifiers will be in this namespace.
A package statement affects only dynamic variables--including those
you've used C<local> on--but I<not> lexical variables, which are created
with C<my>.  Typically it would be the first declaration in a file to
be included by the C<require> or C<use> operator.  You can switch into a
package in more than one place; it merely influences which symbol table
is used by the compiler for the rest of that block.  You can refer to
variables and filehandles in other packages by prefixing the identifier
with the package name and a double colon:  C<$Package::Variable>.
If the package name is null, the C<main> package as assumed.  That is,
C<$::sail> is equivalent to C<$main::sail> (as well as to C<$main'sail>,
still seen in older code).

If NAMESPACE is omitted, then there is no current package, and all
identifiers must be fully qualified or lexicals.  This is stricter
than C<use strict>, since it also extends to function names.

See L<perlmod/"Packages"> for more information about packages, modules,
and classes.  See L<perlsub> for other scoping issues.

=item pipe READHANDLE,WRITEHANDLE

Opens a pair of connected pipes like the corresponding system call.
Note that if you set up a loop of piped processes, deadlock can occur
unless you are very careful.  In addition, note that Perl's pipes use
stdio buffering, so you may need to set C<$|> to flush your WRITEHANDLE
after each command, depending on the application.

See L<IPC::Open2>, L<IPC::Open3>, and L<perlipc/"Bidirectional Communication">
for examples of such things.

On systems that support a close-on-exec flag on files, the flag will be set
for the newly opened file descriptors as determined by the value of $^F.
See L<perlvar/$^F>.

=item pop ARRAY

=item pop

Pops and returns the last value of the array, shortening the array by
one element.  Has an effect similar to

    $ARRAY[$#ARRAY--]

If there are no elements in the array, returns the undefined value
(although this may happen at other times as well).  If ARRAY is
omitted, pops the C<@ARGV> array in the main program, and the C<@_>
array in subroutines, just like C<shift>.

=item pos SCALAR

=item pos

Returns the offset of where the last C<m//g> search left off for the variable
in question (C<$_> is used when the variable is not specified).  May be
modified to change that offset.  Such modification will also influence
the C<\G> zero-width assertion in regular expressions.  See L<perlre> and
L<perlop>.

=item print FILEHANDLE LIST

=item print LIST

=item print

Prints a string or a list of strings.  Returns true if successful.
FILEHANDLE may be a scalar variable name, in which case the variable
contains the name of or a reference to the filehandle, thus introducing
one level of indirection.  (NOTE: If FILEHANDLE is a variable and
the next token is a term, it may be misinterpreted as an operator
unless you interpose a C<+> or put parentheses around the arguments.)
If FILEHANDLE is omitted, prints by default to standard output (or
to the last selected output channel--see L</select>).  If LIST is
also omitted, prints C<$_> to the currently selected output channel.
To set the default output channel to something other than STDOUT
use the select operation.  The current value of C<$,> (if any) is
printed between each LIST item.  The current value of C<$\> (if
any) is printed after the entire LIST has been printed.  Because
print takes a LIST, anything in the LIST is evaluated in list
context, and any subroutine that you call will have one or more of
its expressions evaluated in list context.  Also be careful not to
follow the print keyword with a left parenthesis unless you want
the corresponding right parenthesis to terminate the arguments to
the print--interpose a C<+> or put parentheses around all the
arguments.

Note that if you're storing FILEHANDLES in an array or other expression,
you will have to use a block returning its value instead:

    print { $files[$i] } "stuff\n";
    print { $OK ? STDOUT : STDERR } "stuff\n";

=item printf FILEHANDLE FORMAT, LIST

=item printf FORMAT, LIST

Equivalent to C<print FILEHANDLE sprintf(FORMAT, LIST)>, except that C<$\>
(the output record separator) is not appended.  The first argument
of the list will be interpreted as the C<printf> format.  If C<use locale> is
in effect, the character used for the decimal point in formatted real numbers
is affected by the LC_NUMERIC locale.  See L<perllocale>.

Don't fall into the trap of using a C<printf> when a simple
C<print> would do.  The C<print> is more efficient and less
error prone.

=item prototype FUNCTION

Returns the prototype of a function as a string (or C<undef> if the
function has no prototype).  FUNCTION is a reference to, or the name of,
the function whose prototype you want to retrieve.

If FUNCTION is a string starting with C<CORE::>, the rest is taken as a
name for Perl builtin.  If the builtin is not I<overridable> (such as
C<qw//>) or its arguments cannot be expressed by a prototype (such as
C<system>) returns C<undef> because the builtin does not really behave
like a Perl function.  Otherwise, the string describing the equivalent
prototype is returned.

=item push ARRAY,LIST

Treats ARRAY as a stack, and pushes the values of LIST
onto the end of ARRAY.  The length of ARRAY increases by the length of
LIST.  Has the same effect as

    for $value (LIST) {
	$ARRAY[++$#ARRAY] = $value;
    }

but is more efficient.  Returns the new number of elements in the array.

=item q/STRING/

=item qq/STRING/

=item qr/STRING/

=item qx/STRING/

=item qw/STRING/

Generalized quotes.  See L<perlop/"Regexp Quote-Like Operators">.

=item quotemeta EXPR

=item quotemeta

Returns the value of EXPR with all non-"word"
characters backslashed.  (That is, all characters not matching
C</[A-Za-z_0-9]/> will be preceded by a backslash in the
returned string, regardless of any locale settings.)
This is the internal function implementing
the C<\Q> escape in double-quoted strings.

If EXPR is omitted, uses C<$_>.

=item rand EXPR

=item rand

Returns a random fractional number greater than or equal to C<0> and less
than the value of EXPR.  (EXPR should be positive.)  If EXPR is
omitted, the value C<1> is used.  Automatically calls C<srand> unless
C<srand> has already been called.  See also C<srand>.

(Note: If your rand function consistently returns numbers that are too
large or too small, then your version of Perl was probably compiled
with the wrong number of RANDBITS.)

=item read FILEHANDLE,SCALAR,LENGTH,OFFSET

=item read FILEHANDLE,SCALAR,LENGTH

Attempts to read LENGTH bytes of data into variable SCALAR from the
specified FILEHANDLE.  Returns the number of bytes actually read, C<0>
at end of file, or undef if there was an error.  SCALAR will be grown
or shrunk to the length actually read.  If SCALAR needs growing, the
new bytes will be zero bytes.  An OFFSET may be specified to place
the read data into some other place in SCALAR than the beginning.
The call is actually implemented in terms of stdio's fread(3) call.
To get a true read(2) system call, see C<sysread>.

=item readdir DIRHANDLE

Returns the next directory entry for a directory opened by C<opendir>.
If used in list context, returns all the rest of the entries in the
directory.  If there are no more entries, returns an undefined value in
scalar context or a null list in list context.

If you're planning to filetest the return values out of a C<readdir>, you'd
better prepend the directory in question.  Otherwise, because we didn't
C<chdir> there, it would have been testing the wrong file.

    opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!";
    @dots = grep { /^\./ && -f "$some_dir/$_" } readdir(DIR);
    closedir DIR;

=item readline EXPR

Reads from the filehandle whose typeglob is contained in EXPR.  In scalar
context, each call reads and returns the next line, until end-of-file is
reached, whereupon the subsequent call returns undef.  In list context,
reads until end-of-file is reached and returns a list of lines.  Note that
the notion of "line" used here is however you may have defined it
with C<$/> or C<$INPUT_RECORD_SEPARATOR>).  See L<perlvar/"$/">.

When C<$/> is set to C<undef>, when readline() is in scalar
context (i.e. file slurp mode), and when an empty file is read, it
returns C<''> the first time, followed by C<undef> subsequently.

This is the internal function implementing the C<< <EXPR> >>
operator, but you can use it directly.  The C<< <EXPR> >>
operator is discussed in more detail in L<perlop/"I/O Operators">.

    $line = <STDIN>;
    $line = readline(*STDIN);		# same thing

=item readlink EXPR

=item readlink

Returns the value of a symbolic link, if symbolic links are
implemented.  If not, gives a fatal error.  If there is some system
error, returns the undefined value and sets C<$!> (errno).  If EXPR is
omitted, uses C<$_>.

=item readpipe EXPR

EXPR is executed as a system command.
The collected standard output of the command is returned.
In scalar context, it comes back as a single (potentially
multi-line) string.  In list context, returns a list of lines
(however you've defined lines with C<$/> or C<$INPUT_RECORD_SEPARATOR>).
This is the internal function implementing the C<qx/EXPR/>
operator, but you can use it directly.  The C<qx/EXPR/>
operator is discussed in more detail in L<perlop/"I/O Operators">.

=item recv SOCKET,SCALAR,LENGTH,FLAGS

Receives a message on a socket.  Attempts to receive LENGTH bytes of
data into variable SCALAR from the specified SOCKET filehandle.  SCALAR
will be grown or shrunk to the length actually read.  Takes the same
flags as the system call of the same name.  Returns the address of the
sender if SOCKET's protocol supports this; returns an empty string
otherwise.  If there's an error, returns the undefined value.  This call
is actually implemented in terms of recvfrom(2) system call.  See
L<perlipc/"UDP: Message Passing"> for examples.

=item redo LABEL

=item redo

The C<redo> command restarts the loop block without evaluating the
conditional again.  The C<continue> block, if any, is not executed.  If
the LABEL is omitted, the command refers to the innermost enclosing
loop.  This command is normally used by programs that want to lie to
themselves about what was just input:

    # a simpleminded Pascal comment stripper
    # (warning: assumes no { or } in strings)
    LINE: while (<STDIN>) {
	while (s|({.*}.*){.*}|$1 |) {}
	s|{.*}| |;
	if (s|{.*| |) {
	    $front = $_;
	    while (<STDIN>) {
		if (/}/) {	# end of comment?
		    s|^|$front\{|;
		    redo LINE;
		}
	    }
	}
	print;
    }

C<redo> cannot be used to retry a block which returns a value such as
C<eval {}>, C<sub {}> or C<do {}>, and should not be used to exit
a grep() or map() operation.

Note that a block by itself is semantically identical to a loop
that executes once.  Thus C<redo> inside such a block will effectively
turn it into a looping construct.

See also L</continue> for an illustration of how C<last>, C<next>, and
C<redo> work.

=item ref EXPR

=item ref

Returns a true value if EXPR is a reference, false otherwise.  If EXPR
is not specified, C<$_> will be used.  The value returned depends on the
type of thing the reference is a reference to.
Builtin types include:

    SCALAR
    ARRAY
    HASH
    CODE
    REF
    GLOB
    LVALUE

If the referenced object has been blessed into a package, then that package
name is returned instead.  You can think of C<ref> as a C<typeof> operator.

    if (ref($r) eq "HASH") {
	print "r is a reference to a hash.\n";
    }
    unless (ref($r)) {
	print "r is not a reference at all.\n";
    }
    if (UNIVERSAL::isa($r, "HASH")) {  # for subclassing
	print "r is a reference to something that isa hash.\n";
    } 

See also L<perlref>.

=item rename OLDNAME,NEWNAME

Changes the name of a file; an existing file NEWNAME will be
clobbered.  Returns true for success, false otherwise.

Behavior of this function varies wildly depending on your system
implementation.  For example, it will usually not work across file system
boundaries, even though the system I<mv> command sometimes compensates
for this.  Other restrictions include whether it works on directories,
open files, or pre-existing files.  Check L<perlport> and either the
rename(2) manpage or equivalent system documentation for details.

=item require VERSION

=item require EXPR

=item require

Demands some semantics specified by EXPR, or by C<$_> if EXPR is not
supplied.

If a VERSION is specified as a literal of the form v5.6.1,
demands that the current version of Perl (C<$^V> or $PERL_VERSION) be
at least as recent as that version, at run time.  (For compatibility
with older versions of Perl, a numeric argument will also be interpreted
as VERSION.)  Compare with L</use>, which can do a similar check at
compile time.

    require v5.6.1;	# run time version check
    require 5.6.1;	# ditto
    require 5.005_03;	# float version allowed for compatibility

Otherwise, demands that a library file be included if it hasn't already
been included.  The file is included via the do-FILE mechanism, which is
essentially just a variety of C<eval>.  Has semantics similar to the following
subroutine:

    sub require {
	my($filename) = @_;
	return 1 if $INC{$filename};
	my($realfilename,$result);
	ITER: {
	    foreach $prefix (@INC) {
		$realfilename = "$prefix/$filename";
		if (-f $realfilename) {
		    $INC{$filename} = $realfilename;
		    $result = do $realfilename;
		    last ITER;
		}
	    }
	    die "Can't find $filename in \@INC";
	}
	delete $INC{$filename} if $@ || !$result;
	die $@ if $@;
	die "$filename did not return true value" unless $result;
	return $result;
    }

Note that the file will not be included twice under the same specified
name.  The file must return true as the last statement to indicate
successful execution of any initialization code, so it's customary to
end such a file with C<1;> unless you're sure it'll return true
otherwise.  But it's better just to put the C<1;>, in case you add more
statements.

If EXPR is a bareword, the require assumes a "F<.pm>" extension and
replaces "F<::>" with "F</>" in the filename for you,
to make it easy to load standard modules.  This form of loading of
modules does not risk altering your namespace.

In other words, if you try this:

        require Foo::Bar;    # a splendid bareword 

The require function will actually look for the "F<Foo/Bar.pm>" file in the 
directories specified in the C<@INC> array.

But if you try this:

        $class = 'Foo::Bar';
        require $class;	     # $class is not a bareword
    #or
        require "Foo::Bar";  # not a bareword because of the ""

The require function will look for the "F<Foo::Bar>" file in the @INC array and 
will complain about not finding "F<Foo::Bar>" there.  In this case you can do:

        eval "require $class";

For a yet-more-powerful import facility, see L</use> and L<perlmod>.

=item reset EXPR

=item reset

Generally used in a C<continue> block at the end of a loop to clear
variables and reset C<??> searches so that they work again.  The
expression is interpreted as a list of single characters (hyphens
allowed for ranges).  All variables and arrays beginning with one of
those letters are reset to their pristine state.  If the expression is
omitted, one-match searches (C<?pattern?>) are reset to match again.  Resets
only variables or searches in the current package.  Always returns
1.  Examples:

    reset 'X';		# reset all X variables
    reset 'a-z';	# reset lower case variables
    reset;		# just reset ?one-time? searches

Resetting C<"A-Z"> is not recommended because you'll wipe out your
C<@ARGV> and C<@INC> arrays and your C<%ENV> hash.  Resets only package
variables--lexical variables are unaffected, but they clean themselves
up on scope exit anyway, so you'll probably want to use them instead.
See L</my>.

=item return EXPR

=item return

Returns from a subroutine, C<eval>, or C<do FILE> with the value 
given in EXPR.  Evaluation of EXPR may be in list, scalar, or void
context, depending on how the return value will be used, and the context
may vary from one execution to the next (see C<wantarray>).  If no EXPR
is given, returns an empty list in list context, the undefined value in
scalar context, and (of course) nothing at all in a void context.

(Note that in the absence of a explicit C<return>, a subroutine, eval,
or do FILE will automatically return the value of the last expression
evaluated.)

=item reverse LIST

In list context, returns a list value consisting of the elements
of LIST in the opposite order.  In scalar context, concatenates the
elements of LIST and returns a string value with all characters
in the opposite order.

    print reverse <>;		# line tac, last line first

    undef $/;			# for efficiency of <>
    print scalar reverse <>;	# character tac, last line tsrif

This operator is also handy for inverting a hash, although there are some
caveats.  If a value is duplicated in the original hash, only one of those
can be represented as a key in the inverted hash.  Also, this has to
unwind one hash and build a whole new one, which may take some time
on a large hash, such as from a DBM file.

    %by_name = reverse %by_address;	# Invert the hash

=item rewinddir DIRHANDLE

Sets the current position to the beginning of the directory for the
C<readdir> routine on DIRHANDLE.

=item rindex STR,SUBSTR,POSITION

=item rindex STR,SUBSTR

Works just like index() except that it returns the position of the LAST
occurrence of SUBSTR in STR.  If POSITION is specified, returns the
last occurrence at or before that position.

=item rmdir FILENAME

=item rmdir

Deletes the directory specified by FILENAME if that directory is empty.  If it
succeeds it returns true, otherwise it returns false and sets C<$!> (errno).  If
FILENAME is omitted, uses C<$_>.

=item s///

The substitution operator.  See L<perlop>.

=item scalar EXPR

Forces EXPR to be interpreted in scalar context and returns the value
of EXPR.

    @counts = ( scalar @a, scalar @b, scalar @c );

There is no equivalent operator to force an expression to
be interpolated in list context because in practice, this is never
needed.  If you really wanted to do so, however, you could use
the construction C<@{[ (some expression) ]}>, but usually a simple
C<(some expression)> suffices.

Because C<scalar> is unary operator, if you accidentally use for EXPR a
parenthesized list, this behaves as a scalar comma expression, evaluating
all but the last element in void context and returning the final element
evaluated in scalar context.  This is seldom what you want.

The following single statement:

	print uc(scalar(&foo,$bar)),$baz;

is the moral equivalent of these two:

	&foo;
	print(uc($bar),$baz);

See L<perlop> for more details on unary operators and the comma operator.

=item seek FILEHANDLE,POSITION,WHENCE

Sets FILEHANDLE's position, just like the C<fseek> call of C<stdio>.
FILEHANDLE may be an expression whose value gives the name of the
filehandle.  The values for WHENCE are C<0> to set the new position to
POSITION, C<1> to set it to the current position plus POSITION, and
C<2> to set it to EOF plus POSITION (typically negative).  For WHENCE
you may use the constants C<SEEK_SET>, C<SEEK_CUR>, and C<SEEK_END>
(start of the file, current position, end of the file) from the Fcntl
module.  Returns C<1> upon success, C<0> otherwise.

If you want to position file for C<sysread> or C<syswrite>, don't use
C<seek>--buffering makes its effect on the file's system position
unpredictable and non-portable.  Use C<sysseek> instead.

Due to the rules and rigors of ANSI C, on some systems you have to do a
seek whenever you switch between reading and writing.  Amongst other
things, this may have the effect of calling stdio's clearerr(3).
A WHENCE of C<1> (C<SEEK_CUR>) is useful for not moving the file position:

    seek(TEST,0,1);

This is also useful for applications emulating C<tail -f>.  Once you hit
EOF on your read, and then sleep for a while, you might have to stick in a
seek() to reset things.  The C<seek> doesn't change the current position,
but it I<does> clear the end-of-file condition on the handle, so that the
next C<< <FILE> >> makes Perl try again to read something.  We hope.

If that doesn't work (some stdios are particularly cantankerous), then
you may need something more like this:

    for (;;) {
	for ($curpos = tell(FILE); $_ = <FILE>;
             $curpos = tell(FILE)) {
	    # search for some stuff and put it into files
	}
	sleep($for_a_while);
	seek(FILE, $curpos, 0);
    }

=item seekdir DIRHANDLE,POS

Sets the current position for the C<readdir> routine on DIRHANDLE.  POS
must be a value returned by C<telldir>.  Has the same caveats about
possible directory compaction as the corresponding system library
routine.

=item select FILEHANDLE

=item select

Returns the currently selected filehandle.  Sets the current default
filehandle for output, if FILEHANDLE is supplied.  This has two
effects: first, a C<write> or a C<print> without a filehandle will
default to this FILEHANDLE.  Second, references to variables related to
output will refer to this output channel.  For example, if you have to
set the top of form format for more than one output channel, you might
do the following:

    select(REPORT1);
    $^ = 'report1_top';
    select(REPORT2);
    $^ = 'report2_top';

FILEHANDLE may be an expression whose value gives the name of the
actual filehandle.  Thus:

    $oldfh = select(STDERR); $| = 1; select($oldfh);

Some programmers may prefer to think of filehandles as objects with
methods, preferring to write the last example as:

    use IO::Handle;
    STDERR->autoflush(1);

=item select RBITS,WBITS,EBITS,TIMEOUT

This calls the select(2) system call with the bit masks specified, which
can be constructed using C<fileno> and C<vec>, along these lines:

    $rin = $win = $ein = '';
    vec($rin,fileno(STDIN),1) = 1;
    vec($win,fileno(STDOUT),1) = 1;
    $ein = $rin | $win;

If you want to select on many filehandles you might wish to write a
subroutine:

    sub fhbits {
	my(@fhlist) = split(' ',$_[0]);
	my($bits);
	for (@fhlist) {
	    vec($bits,fileno($_),1) = 1;
	}
	$bits;
    }
    $rin = fhbits('STDIN TTY SOCK');

The usual idiom is:

    ($nfound,$timeleft) =
      select($rout=$rin, $wout=$win, $eout=$ein, $timeout);

or to block until something becomes ready just do this

    $nfound = select($rout=$rin, $wout=$win, $eout=$ein, undef);

Most systems do not bother to return anything useful in $timeleft, so
calling select() in scalar context just returns $nfound.

Any of the bit masks can also be undef.  The timeout, if specified, is
in seconds, which may be fractional.  Note: not all implementations are
capable of returning the$timeleft.  If not, they always return
$timeleft equal to the supplied $timeout.

You can effect a sleep of 250 milliseconds this way:

    select(undef, undef, undef, 0.25);

B<WARNING>: One should not attempt to mix buffered I/O (like C<read>
or <FH>) with C<select>, except as permitted by POSIX, and even
then only on POSIX systems.  You have to use C<sysread> instead.

=item semctl ID,SEMNUM,CMD,ARG

Calls the System V IPC function C<semctl>.  You'll probably have to say

    use IPC::SysV;

first to get the correct constant definitions.  If CMD is IPC_STAT or
GETALL, then ARG must be a variable which will hold the returned
semid_ds structure or semaphore value array.  Returns like C<ioctl>:
the undefined value for error, "C<0 but true>" for zero, or the actual
return value otherwise.  The ARG must consist of a vector of native
short integers, which may be created with C<pack("s!",(0)x$nsem)>.
See also L<perlipc/"SysV IPC">, C<IPC::SysV>, C<IPC::Semaphore>
documentation.

=item semget KEY,NSEMS,FLAGS

Calls the System V IPC function semget.  Returns the semaphore id, or
the undefined value if there is an error.  See also
L<perlipc/"SysV IPC">, C<IPC::SysV>, C<IPC::SysV::Semaphore>
documentation.

=item semop KEY,OPSTRING

Calls the System V IPC function semop to perform semaphore operations
such as signaling and waiting.  OPSTRING must be a packed array of
semop structures.  Each semop structure can be generated with
C<pack("sss", $semnum, $semop, $semflag)>.  The number of semaphore
operations is implied by the length of OPSTRING.  Returns true if
successful, or false if there is an error.  As an example, the
following code waits on semaphore $semnum of semaphore id $semid:

    $semop = pack("sss", $semnum, -1, 0);
    die "Semaphore trouble: $!\n" unless semop($semid, $semop);

To signal the semaphore, replace C<-1> with C<1>.  See also
L<perlipc/"SysV IPC">, C<IPC::SysV>, and C<IPC::SysV::Semaphore>
documentation.

=item send SOCKET,MSG,FLAGS,TO

=item send SOCKET,MSG,FLAGS

Sends a message on a socket.  Takes the same flags as the system call
of the same name.  On unconnected sockets you must specify a
destination to send TO, in which case it does a C C<sendto>.  Returns
the number of characters sent, or the undefined value if there is an
error.  The C system call sendmsg(2) is currently unimplemented.
See L<perlipc/"UDP: Message Passing"> for examples.

=item setpgrp PID,PGRP

Sets the current process group for the specified PID, C<0> for the current
process.  Will produce a fatal error if used on a machine that doesn't
implement POSIX setpgid(2) or BSD setpgrp(2).  If the arguments are omitted,
it defaults to C<0,0>.  Note that the BSD 4.2 version of C<setpgrp> does not
accept any arguments, so only C<setpgrp(0,0)> is portable.  See also
C<POSIX::setsid()>.

=item setpriority WHICH,WHO,PRIORITY

Sets the current priority for a process, a process group, or a user.
(See setpriority(2).)  Will produce a fatal error if used on a machine
that doesn't implement setpriority(2).

=item setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL

Sets the socket option requested.  Returns undefined if there is an
error.  OPTVAL may be specified as C<undef> if you don't want to pass an
argument.

=item shift ARRAY

=item shift

Shifts the first value of the array off and returns it, shortening the
array by 1 and moving everything down.  If there are no elements in the
array, returns the undefined value.  If ARRAY is omitted, shifts the
C<@_> array within the lexical scope of subroutines and formats, and the
C<@ARGV> array at file scopes or within the lexical scopes established by
the C<eval ''>, C<BEGIN {}>, C<INIT {}>, C<CHECK {}>, and C<END {}>
constructs.

See also C<unshift>, C<push>, and C<pop>.  C<shift> and C<unshift> do the
same thing to the left end of an array that C<pop> and C<push> do to the
right end.

=item shmctl ID,CMD,ARG

Calls the System V IPC function shmctl.  You'll probably have to say

    use IPC::SysV;

first to get the correct constant definitions.  If CMD is C<IPC_STAT>,
then ARG must be a variable which will hold the returned C<shmid_ds>
structure.  Returns like ioctl: the undefined value for error, "C<0> but
true" for zero, or the actual return value otherwise.
See also L<perlipc/"SysV IPC"> and C<IPC::SysV> documentation.

=item shmget KEY,SIZE,FLAGS

Calls the System V IPC function shmget.  Returns the shared memory
segment id, or the undefined value if there is an error.
See also L<perlipc/"SysV IPC"> and C<IPC::SysV> documentation.

=item shmread ID,VAR,POS,SIZE

=item shmwrite ID,STRING,POS,SIZE

Reads or writes the System V shared memory segment ID starting at
position POS for size SIZE by attaching to it, copying in/out, and
detaching from it.  When reading, VAR must be a variable that will
hold the data read.  When writing, if STRING is too long, only SIZE
bytes are used; if STRING is too short, nulls are written to fill out
SIZE bytes.  Return true if successful, or false if there is an error.
shmread() taints the variable. See also L<perlipc/"SysV IPC">,
C<IPC::SysV> documentation, and the C<IPC::Shareable> module from CPAN.

=item shutdown SOCKET,HOW

Shuts down a socket connection in the manner indicated by HOW, which
has the same interpretation as in the system call of the same name.

    shutdown(SOCKET, 0);    # I/we have stopped reading data
    shutdown(SOCKET, 1);    # I/we have stopped writing data
    shutdown(SOCKET, 2);    # I/we have stopped using this socket

This is useful with sockets when you want to tell the other
side you're done writing but not done reading, or vice versa.
It's also a more insistent form of close because it also 
disables the file descriptor in any forked copies in other
processes.

=item sin EXPR

=item sin

Returns the sine of EXPR (expressed in radians).  If EXPR is omitted,
returns sine of C<$_>.

For the inverse sine operation, you may use the C<Math::Trig::asin>
function, or use this relation:

    sub asin { atan2($_[0], sqrt(1 - $_[0] * $_[0])) }

=item sleep EXPR

=item sleep

Causes the script to sleep for EXPR seconds, or forever if no EXPR.
May be interrupted if the process receives a signal such as C<SIGALRM>.
Returns the number of seconds actually slept.  You probably cannot
mix C<alarm> and C<sleep> calls, because C<sleep> is often implemented
using C<alarm>.

On some older systems, it may sleep up to a full second less than what
you requested, depending on how it counts seconds.  Most modern systems
always sleep the full amount.  They may appear to sleep longer than that,
however, because your process might not be scheduled right away in a
busy multitasking system.

For delays of finer granularity than one second, you may use Perl's
C<syscall> interface to access setitimer(2) if your system supports
it, or else see L</select> above.  The Time::HiRes module from CPAN
may also help.

See also the POSIX module's C<pause> function.

=item socket SOCKET,DOMAIN,TYPE,PROTOCOL

Opens a socket of the specified kind and attaches it to filehandle
SOCKET.  DOMAIN, TYPE, and PROTOCOL are specified the same as for
the system call of the same name.  You should C<use Socket> first
to get the proper definitions imported.  See the examples in
L<perlipc/"Sockets: Client/Server Communication">.

On systems that support a close-on-exec flag on files, the flag will
be set for the newly opened file descriptor, as determined by the
value of $^F.  See L<perlvar/$^F>.

=item socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL

Creates an unnamed pair of sockets in the specified domain, of the
specified type.  DOMAIN, TYPE, and PROTOCOL are specified the same as
for the system call of the same name.  If unimplemented, yields a fatal
error.  Returns true if successful.

On systems that support a close-on-exec flag on files, the flag will
be set for the newly opened file descriptors, as determined by the value
of $^F.  See L<perlvar/$^F>.

Some systems defined C<pipe> in terms of C<socketpair>, in which a call
to C<pipe(Rdr, Wtr)> is essentially:

    use Socket;
    socketpair(Rdr, Wtr, AF_UNIX, SOCK_STREAM, PF_UNSPEC);
    shutdown(Rdr, 1);        # no more writing for reader
    shutdown(Wtr, 0);        # no more reading for writer

See L<perlipc> for an example of socketpair use.

=item sort SUBNAME LIST

=item sort BLOCK LIST

=item sort LIST

Sorts the LIST and returns the sorted list value.  If SUBNAME or BLOCK
is omitted, C<sort>s in standard string comparison order.  If SUBNAME is
specified, it gives the name of a subroutine that returns an integer
less than, equal to, or greater than C<0>, depending on how the elements
of the list are to be ordered.  (The C<< <=> >> and C<cmp>
operators are extremely useful in such routines.)  SUBNAME may be a
scalar variable name (unsubscripted), in which case the value provides
the name of (or a reference to) the actual subroutine to use.  In place
of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort
subroutine.

If the subroutine's prototype is C<($$)>, the elements to be compared
are passed by reference in C<@_>, as for a normal subroutine.  This is
slower than unprototyped subroutines, where the elements to be
compared are passed into the subroutine
as the package global variables $a and $b (see example below).  Note that
in the latter case, it is usually counter-productive to declare $a and
$b as lexicals.

In either case, the subroutine may not be recursive.  The values to be
compared are always passed by reference, so don't modify them.

You also cannot exit out of the sort block or subroutine using any of the
loop control operators described in L<perlsyn> or with C<goto>.

When C<use locale> is in effect, C<sort LIST> sorts LIST according to the
current collation locale.  See L<perllocale>.

Examples:

    # sort lexically
    @articles = sort @files;

    # same thing, but with explicit sort routine
    @articles = sort {$a cmp $b} @files;

    # now case-insensitively
    @articles = sort {uc($a) cmp uc($b)} @files;

    # same thing in reversed order
    @articles = sort {$b cmp $a} @files;

    # sort numerically ascending
    @articles = sort {$a <=> $b} @files;

    # sort numerically descending
    @articles = sort {$b <=> $a} @files;

    # this sorts the %age hash by value instead of key
    # using an in-line function
    @eldest = sort { $age{$b} <=> $age{$a} } keys %age;

    # sort using explicit subroutine name
    sub byage {
	$age{$a} <=> $age{$b};	# presuming numeric
    }
    @sortedclass = sort byage @class;

    sub backwards { $b cmp $a }
    @harry  = qw(dog cat x Cain Abel);
    @george = qw(gone chased yz Punished Axed);
    print sort @harry;
	    # prints AbelCaincatdogx
    print sort backwards @harry;
	    # prints xdogcatCainAbel
    print sort @george, 'to', @harry;
	    # prints AbelAxedCainPunishedcatchaseddoggonetoxyz

    # inefficiently sort by descending numeric compare using
    # the first integer after the first = sign, or the
    # whole record case-insensitively otherwise

    @new = sort {
	($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
			    ||
	            uc($a)  cmp  uc($b)
    } @old;

    # same thing, but much more efficiently;
    # we'll build auxiliary indices instead
    # for speed
    @nums = @caps = ();
    for (@old) {
	push @nums, /=(\d+)/;
	push @caps, uc($_);
    }

    @new = @old[ sort {
			$nums[$b] <=> $nums[$a]
				 ||
			$caps[$a] cmp $caps[$b]
		       } 0..$#old
	       ];

    # same thing, but without any temps
    @new = map { $_->[0] }
           sort { $b->[1] <=> $a->[1]
                           ||
                  $a->[2] cmp $b->[2]
           } map { [$_, /=(\d+)/, uc($_)] } @old;

    # using a prototype allows you to use any comparison subroutine
    # as a sort subroutine (including other package's subroutines)
    package other;
    sub backwards ($$) { $_[1] cmp $_[0]; }	# $a and $b are not set here

    package main;
    @new = sort other::backwards @old;

If you're using strict, you I<must not> declare $a
and $b as lexicals.  They are package globals.  That means
if you're in the C<main> package and type

    @articles = sort {$b <=> $a} @files;

then C<$a> and C<$b> are C<$main::a> and C<$main::b> (or C<$::a> and C<$::b>),
but if you're in the C<FooPack> package, it's the same as typing

    @articles = sort {$FooPack::b <=> $FooPack::a} @files;

The comparison function is required to behave.  If it returns
inconsistent results (sometimes saying C<$x[1]> is less than C<$x[2]> and
sometimes saying the opposite, for example) the results are not
well-defined.

=item splice ARRAY,OFFSET,LENGTH,LIST

=item splice ARRAY,OFFSET,LENGTH

=item splice ARRAY,OFFSET

=item splice ARRAY

Removes the elements designated by OFFSET and LENGTH from an array, and
replaces them with the elements of LIST, if any.  In list context,
returns the elements removed from the array.  In scalar context,
returns the last element removed, or C<undef> if no elements are
removed.  The array grows or shrinks as necessary.
If OFFSET is negative then it starts that far from the end of the array.
If LENGTH is omitted, removes everything from OFFSET onward.
If LENGTH is negative, leaves that many elements off the end of the array.
If both OFFSET and LENGTH are omitted, removes everything.

The following equivalences hold (assuming C<$[ == 0>):

    push(@a,$x,$y)	splice(@a,@a,0,$x,$y)
    pop(@a)		splice(@a,-1)
    shift(@a)		splice(@a,0,1)
    unshift(@a,$x,$y)	splice(@a,0,0,$x,$y)
    $a[$x] = $y		splice(@a,$x,1,$y)

Example, assuming array lengths are passed before arrays:

    sub aeq {	# compare two list values
	my(@a) = splice(@_,0,shift);
	my(@b) = splice(@_,0,shift);
	return 0 unless @a == @b;	# same len?
	while (@a) {
	    return 0 if pop(@a) ne pop(@b);
	}
	return 1;
    }
    if (&aeq($len,@foo[1..$len],0+@bar,@bar)) { ... }

=item split /PATTERN/,EXPR,LIMIT

=item split /PATTERN/,EXPR

=item split /PATTERN/

=item split

Splits a string into a list of strings and returns that list.  By default,
empty leading fields are preserved, and empty trailing ones are deleted.

In scalar context, returns the number of fields found and splits into
the C<@_> array.  Use of split in scalar context is deprecated, however,
because it clobbers your subroutine arguments.

If EXPR is omitted, splits the C<$_> string.  If PATTERN is also omitted,
splits on whitespace (after skipping any leading whitespace).  Anything
matching PATTERN is taken to be a delimiter separating the fields.  (Note
that the delimiter may be longer than one character.)

If LIMIT is specified and positive, splits into no more than that
many fields (though it may split into fewer).  If LIMIT is unspecified
or zero, trailing null fields are stripped (which potential users
of C<pop> would do well to remember).  If LIMIT is negative, it is
treated as if an arbitrarily large LIMIT had been specified.

A pattern matching the null string (not to be confused with
a null pattern C<//>, which is just one member of the set of patterns
matching a null string) will split the value of EXPR into separate
characters at each point it matches that way.  For example:

    print join(':', split(/ */, 'hi there'));

produces the output 'h:i:t:h:e:r:e'.

Empty leading (or trailing) fields are produced when there positive width
matches at the beginning (or end) of the string; a zero-width match at the
beginning (or end) of the string does not produce an empty field.  For
example:

   print join(':', split(/(?=\w)/, 'hi there!'));

produces the output 'h:i :t:h:e:r:e!'.

The LIMIT parameter can be used to split a line partially

    ($login, $passwd, $remainder) = split(/:/, $_, 3);

When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT
one larger than the number of variables in the list, to avoid
unnecessary work.  For the list above LIMIT would have been 4 by
default.  In time critical applications it behooves you not to split
into more fields than you really need.

If the PATTERN contains parentheses, additional list elements are
created from each matching substring in the delimiter.

    split(/([,-])/, "1-10,20", 3);

produces the list value

    (1, '-', 10, ',', 20)

If you had the entire header of a normal Unix email message in $header,
you could split it up into fields and their values this way:

    $header =~ s/\n\s+/ /g;  # fix continuation lines
    %hdrs   =  (UNIX_FROM => split /^(\S*?):\s*/m, $header);

The pattern C</PATTERN/> may be replaced with an expression to specify
patterns that vary at runtime.  (To do runtime compilation only once,
use C</$variable/o>.)

As a special case, specifying a PATTERN of space (C<' '>) will split on
white space just as C<split> with no arguments does.  Thus, C<split(' ')> can
be used to emulate B<awk>'s default behavior, whereas C<split(/ /)>
will give you as many null initial fields as there are leading spaces.
A C<split> on C</\s+/> is like a C<split(' ')> except that any leading
whitespace produces a null first field.  A C<split> with no arguments
really does a C<split(' ', $_)> internally.

A PATTERN of C</^/> is treated as if it were C</^/m>, since it isn't
much use otherwise.

Example:

    open(PASSWD, '/etc/passwd');
    while (<PASSWD>) {
        chomp;
        ($login, $passwd, $uid, $gid,
         $gcos, $home, $shell) = split(/:/);
	#...
    }


=item sprintf FORMAT, LIST

Returns a string formatted by the usual C<printf> conventions of the C
library function C<sprintf>.  See below for more details
and see L<sprintf(3)> or L<printf(3)> on your system for an explanation of
the general principles.

For example:

        # Format number with up to 8 leading zeroes
        $result = sprintf("%08d", $number);

        # Round number to 3 digits after decimal point
        $rounded = sprintf("%.3f", $number);

Perl does its own C<sprintf> formatting--it emulates the C
function C<sprintf>, but it doesn't use it (except for floating-point
numbers, and even then only the standard modifiers are allowed).  As a
result, any non-standard extensions in your local C<sprintf> are not
available from Perl.

Unlike C<printf>, C<sprintf> does not do what you probably mean when you
pass it an array as your first argument. The array is given scalar context,
and instead of using the 0th element of the array as the format, Perl will
use the count of elements in the array as the format, which is almost never
useful.

Perl's C<sprintf> permits the following universally-known conversions:

   %%	a percent sign
   %c	a character with the given number
   %s	a string
   %d	a signed integer, in decimal
   %u	an unsigned integer, in decimal
   %o	an unsigned integer, in octal
   %x	an unsigned integer, in hexadecimal
   %e	a floating-point number, in scientific notation
   %f	a floating-point number, in fixed decimal notation
   %g	a floating-point number, in %e or %f notation

In addition, Perl permits the following widely-supported conversions:

   %X	like %x, but using upper-case letters
   %E	like %e, but using an upper-case "E"
   %G	like %g, but with an upper-case "E" (if applicable)
   %b	an unsigned integer, in binary
   %p	a pointer (outputs the Perl value's address in hexadecimal)
   %n	special: *stores* the number of characters output so far
        into the next variable in the parameter list 

Finally, for backward (and we do mean "backward") compatibility, Perl
permits these unnecessary but widely-supported conversions:

   %i	a synonym for %d
   %D	a synonym for %ld
   %U	a synonym for %lu
   %O	a synonym for %lo
   %F	a synonym for %f

Note that the number of exponent digits in the scientific notation by
C<%e>, C<%E>, C<%g> and C<%G> for numbers with the modulus of the
exponent less than 100 is system-dependent: it may be three or less
(zero-padded as necessary).  In other words, 1.23 times ten to the
99th may be either "1.23e99" or "1.23e099".

Perl permits the following universally-known flags between the C<%>
and the conversion letter:

   space   prefix positive number with a space
   +       prefix positive number with a plus sign
   -       left-justify within the field
   0       use zeros, not spaces, to right-justify
   #       prefix non-zero octal with "0", non-zero hex with "0x"
   number  minimum field width
   .number "precision": digits after decimal point for
           floating-point, max length for string, minimum length
           for integer
   l       interpret integer as C type "long" or "unsigned long"
   h       interpret integer as C type "short" or "unsigned short"
           If no flags, interpret integer as C type "int" or "unsigned"

There are also two Perl-specific flags:

   V       interpret integer as Perl's standard integer type
   v       interpret string as a vector of integers, output as
           numbers separated either by dots, or by an arbitrary
	   string received from the argument list when the flag
	   is preceded by C<*>

Where a number would appear in the flags, an asterisk (C<*>) may be
used instead, in which case Perl uses the next item in the parameter
list as the given number (that is, as the field width or precision).
If a field width obtained through C<*> is negative, it has the same
effect as the C<-> flag: left-justification.

The C<v> flag is useful for displaying ordinal values of characters
in arbitrary strings:

    printf "version is v%vd\n", $^V;		# Perl's version
    printf "address is %*vX\n", ":", $addr;	# IPv6 address
    printf "bits are %*vb\n", " ", $bits;	# random bitstring

If C<use locale> is in effect, the character used for the decimal
point in formatted real numbers is affected by the LC_NUMERIC locale.
See L<perllocale>.

If Perl understands "quads" (64-bit integers) (this requires
either that the platform natively support quads or that Perl
be specifically compiled to support quads), the characters

	d u o x X b i D U O

print quads, and they may optionally be preceded by

	ll L q

For example

	%lld %16LX %qo

You can find out whether your Perl supports quads via L<Config>:

	use Config;
	($Config{use64bitint} eq 'define' || $Config{longsize} == 8) &&
		print "quads\n";

If Perl understands "long doubles" (this requires that the platform
support long doubles), the flags

	e f g E F G

may optionally be preceded by

	ll L

For example

	%llf %Lg

You can find out whether your Perl supports long doubles via L<Config>:

	use Config;
	$Config{d_longdbl} eq 'define' && print "long doubles\n";

=item sqrt EXPR

=item sqrt

Return the square root of EXPR.  If EXPR is omitted, returns square
root of C<$_>.  Only works on non-negative operands, unless you've
loaded the standard Math::Complex module.

    use Math::Complex;
    print sqrt(-2);    # prints 1.4142135623731i

=item srand EXPR

=item srand

Sets the random number seed for the C<rand> operator.  If EXPR is
omitted, uses a semi-random value supplied by the kernel (if it supports
the F</dev/urandom> device) or based on the current time and process
ID, among other things.  In versions of Perl prior to 5.004 the default
seed was just the current C<time>.  This isn't a particularly good seed,
so many old programs supply their own seed value (often C<time ^ $$> or
C<time ^ ($$ + ($$ << 15))>), but that isn't necessary any more.

In fact, it's usually not necessary to call C<srand> at all, because if
it is not called explicitly, it is called implicitly at the first use of
the C<rand> operator.  However, this was not the case in version of Perl
before 5.004, so if your script will run under older Perl versions, it
should call C<srand>.

Note that you need something much more random than the default seed for
cryptographic purposes.  Checksumming the compressed output of one or more
rapidly changing operating system status programs is the usual method.  For
example:

    srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip`);

If you're particularly concerned with this, see the C<Math::TrulyRandom>
module in CPAN.

Do I<not> call C<srand> multiple times in your program unless you know
exactly what you're doing and why you're doing it.  The point of the
function is to "seed" the C<rand> function so that C<rand> can produce
a different sequence each time you run your program.  Just do it once at the
top of your program, or you I<won't> get random numbers out of C<rand>!

Frequently called programs (like CGI scripts) that simply use

    time ^ $$

for a seed can fall prey to the mathematical property that

    a^b == (a+1)^(b+1)

one-third of the time.  So don't do that.

=item stat FILEHANDLE

=item stat EXPR

=item stat

Returns a 13-element list giving the status info for a file, either
the file opened via FILEHANDLE, or named by EXPR.  If EXPR is omitted,
it stats C<$_>.  Returns a null list if the stat fails.  Typically used
as follows:

    ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
       $atime,$mtime,$ctime,$blksize,$blocks)
           = stat($filename);

Not all fields are supported on all filesystem types.  Here are the
meaning of the fields:

  0 dev      device number of filesystem
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only)
  7 size     total size of file, in bytes
  8 atime    last access time in seconds since the epoch
  9 mtime    last modify time in seconds since the epoch
 10 ctime    inode change time (NOT creation time!) in seconds since the epoch
 11 blksize  preferred block size for file system I/O
 12 blocks   actual number of blocks allocated

(The epoch was at 00:00 January 1, 1970 GMT.)

If stat is passed the special filehandle consisting of an underline, no
stat is done, but the current contents of the stat structure from the
last stat or filetest are returned.  Example:

    if (-x $file && (($d) = stat(_)) && $d < 0) {
	print "$file is executable NFS file\n";
    }

(This works on machines only for which the device number is negative
under NFS.)

Because the mode contains both the file type and its permissions, you
should mask off the file type portion and (s)printf using a C<"%o"> 
if you want to see the real permissions.

    $mode = (stat($filename))[2];
    printf "Permissions are %04o\n", $mode & 07777;

In scalar context, C<stat> returns a boolean value indicating success
or failure, and, if successful, sets the information associated with
the special filehandle C<_>.

The File::stat module provides a convenient, by-name access mechanism:

    use File::stat;
    $sb = stat($filename);
    printf "File is %s, size is %s, perm %04o, mtime %s\n", 
	$filename, $sb->size, $sb->mode & 07777,
	scalar localtime $sb->mtime;

You can import symbolic mode constants (C<S_IF*>) and functions
(C<S_IS*>) from the Fcntl module:

    use Fcntl ':mode';

    $mode = (stat($filename))[2];

    $user_rwx      = ($mode & S_IRWXU) >> 6;
    $group_read    = ($mode & S_IRGRP) >> 3;
    $other_execute =  $mode & S_IXOTH;

    printf "Permissions are %04o\n", S_ISMODE($mode), "\n";

    $is_setuid     =  $mode & S_ISUID;
    $is_setgid     =  S_ISDIR($mode);

You could write the last two using the C<-u> and C<-d> operators.
The commonly available S_IF* constants are

    # Permissions: read, write, execute, for user, group, others.

    S_IRWXU S_IRUSR S_IWUSR S_IXUSR
    S_IRWXG S_IRGRP S_IWGRP S_IXGRP
    S_IRWXO S_IROTH S_IWOTH S_IXOTH

    # Setuid/Setgid/Stickiness.

    S_ISUID S_ISGID S_ISVTX S_ISTXT

    # File types.  Not necessarily all are available on your system.

    S_IFREG S_IFDIR S_IFLNK S_IFBLK S_ISCHR S_IFIFO S_IFSOCK S_IFWHT S_ENFMT

    # The following are compatibility aliases for S_IRUSR, S_IWUSR, S_IXUSR.

    S_IREAD S_IWRITE S_IEXEC

and the S_IF* functions are

    S_IFMODE($mode)	the part of $mode containing the permission bits
			and the setuid/setgid/sticky bits

    S_IFMT($mode)	the part of $mode containing the file type
			which can be bit-anded with e.g. S_IFREG 
                        or with the following functions

    # The operators -f, -d, -l, -b, -c, -p, and -s.

    S_ISREG($mode) S_ISDIR($mode) S_ISLNK($mode)
    S_ISBLK($mode) S_ISCHR($mode) S_ISFIFO($mode) S_ISSOCK($mode)

    # No direct -X operator counterpart, but for the first one
    # the -g operator is often equivalent.  The ENFMT stands for
    # record flocking enforcement, a platform-dependent feature.

    S_ISENFMT($mode) S_ISWHT($mode)

See your native chmod(2) and stat(2) documentation for more details
about the S_* constants.

=item study SCALAR

=item study

Takes extra time to study SCALAR (C<$_> if unspecified) in anticipation of
doing many pattern matches on the string before it is next modified.
This may or may not save time, depending on the nature and number of
patterns you are searching on, and on the distribution of character
frequencies in the string to be searched--you probably want to compare
run times with and without it to see which runs faster.  Those loops
which scan for many short constant strings (including the constant
parts of more complex patterns) will benefit most.  You may have only
one C<study> active at a time--if you study a different scalar the first
is "unstudied".  (The way C<study> works is this: a linked list of every
character in the string to be searched is made, so we know, for
example, where all the C<'k'> characters are.  From each search string,
the rarest character is selected, based on some static frequency tables
constructed from some C programs and English text.  Only those places
that contain this "rarest" character are examined.)

For example, here is a loop that inserts index producing entries
before any line containing a certain pattern:

    while (<>) {
	study;
	print ".IX foo\n" 	if /\bfoo\b/;
	print ".IX bar\n" 	if /\bbar\b/;
	print ".IX blurfl\n" 	if /\bblurfl\b/;
	# ...
	print;
    }

In searching for C</\bfoo\b/>, only those locations in C<$_> that contain C<f>
will be looked at, because C<f> is rarer than C<o>.  In general, this is
a big win except in pathological cases.  The only question is whether
it saves you more time than it took to build the linked list in the
first place.

Note that if you have to look for strings that you don't know till
runtime, you can build an entire loop as a string and C<eval> that to
avoid recompiling all your patterns all the time.  Together with
undefining C<$/> to input entire files as one record, this can be very
fast, often faster than specialized programs like fgrep(1).  The following
scans a list of files (C<@files>) for a list of words (C<@words>), and prints
out the names of those files that contain a match:

    $search = 'while (<>) { study;';
    foreach $word (@words) {
	$search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n";
    }
    $search .= "}";
    @ARGV = @files;
    undef $/;
    eval $search;		# this screams
    $/ = "\n";		# put back to normal input delimiter
    foreach $file (sort keys(%seen)) {
	print $file, "\n";
    }

=item sub BLOCK

=item sub NAME

=item sub NAME BLOCK

This is subroutine definition, not a real function I<per se>.  With just a
NAME (and possibly prototypes or attributes), it's just a forward declaration.
Without a NAME, it's an anonymous function declaration, and does actually
return a value: the CODE ref of the closure you just created.  See L<perlsub>
and L<perlref> for details.

=item substr EXPR,OFFSET,LENGTH,REPLACEMENT

=item substr EXPR,OFFSET,LENGTH

=item substr EXPR,OFFSET

Extracts a substring out of EXPR and returns it.  First character is at
offset C<0>, or whatever you've set C<$[> to (but don't do that).
If OFFSET is negative (or more precisely, less than C<$[>), starts
that far from the end of the string.  If LENGTH is omitted, returns
everything to the end of the string.  If LENGTH is negative, leaves that
many characters off the end of the string.

You can use the substr() function as an lvalue, in which case EXPR
must itself be an lvalue.  If you assign something shorter than LENGTH,
the string will shrink, and if you assign something longer than LENGTH,
the string will grow to accommodate it.  To keep the string the same
length you may need to pad or chop your value using C<sprintf>.

If OFFSET and LENGTH specify a substring that is partly outside the
string, only the part within the string is returned.  If the substring
is beyond either end of the string, substr() returns the undefined
value and produces a warning.  When used as an lvalue, specifying a
substring that is entirely outside the string is a fatal error.
Here's an example showing the behavior for boundary cases:

    my $name = 'fred';
    substr($name, 4) = 'dy';		# $name is now 'freddy'
    my $null = substr $name, 6, 2;	# returns '' (no warning)
    my $oops = substr $name, 7;		# returns undef, with warning
    substr($name, 7) = 'gap';		# fatal error

An alternative to using substr() as an lvalue is to specify the
replacement string as the 4th argument.  This allows you to replace
parts of the EXPR and return what was there before in one operation,
just as you can with splice().

=item symlink OLDFILE,NEWFILE

Creates a new filename symbolically linked to the old filename.
Returns C<1> for success, C<0> otherwise.  On systems that don't support
symbolic links, produces a fatal error at run time.  To check for that,
use eval:

    $symlink_exists = eval { symlink("",""); 1 };

=item syscall LIST

Calls the system call specified as the first element of the list,
passing the remaining elements as arguments to the system call.  If
unimplemented, produces a fatal error.  The arguments are interpreted
as follows: if a given argument is numeric, the argument is passed as
an int.  If not, the pointer to the string value is passed.  You are
responsible to make sure a string is pre-extended long enough to
receive any result that might be written into a string.  You can't use a
string literal (or other read-only string) as an argument to C<syscall>
because Perl has to assume that any string pointer might be written
through.  If your
integer arguments are not literals and have never been interpreted in a
numeric context, you may need to add C<0> to them to force them to look
like numbers.  This emulates the C<syswrite> function (or vice versa):

    require 'syscall.ph';		# may need to run h2ph
    $s = "hi there\n";
    syscall(&SYS_write, fileno(STDOUT), $s, length $s);

Note that Perl supports passing of up to only 14 arguments to your system call,
which in practice should usually suffice.

Syscall returns whatever value returned by the system call it calls.
If the system call fails, C<syscall> returns C<-1> and sets C<$!> (errno).
Note that some system calls can legitimately return C<-1>.  The proper
way to handle such calls is to assign C<$!=0;> before the call and
check the value of C<$!> if syscall returns C<-1>.

There's a problem with C<syscall(&SYS_pipe)>: it returns the file
number of the read end of the pipe it creates.  There is no way
to retrieve the file number of the other end.  You can avoid this 
problem by using C<pipe> instead.

=item sysopen FILEHANDLE,FILENAME,MODE

=item sysopen FILEHANDLE,FILENAME,MODE,PERMS

Opens the file whose filename is given by FILENAME, and associates it
with FILEHANDLE.  If FILEHANDLE is an expression, its value is used as
the name of the real filehandle wanted.  This function calls the
underlying operating system's C<open> function with the parameters
FILENAME, MODE, PERMS.

The possible values and flag bits of the MODE parameter are
system-dependent; they are available via the standard module C<Fcntl>.
See the documentation of your operating system's C<open> to see which
values and flag bits are available.  You may combine several flags
using the C<|>-operator.

Some of the most common values are C<O_RDONLY> for opening the file in
read-only mode, C<O_WRONLY> for opening the file in write-only mode,
and C<O_RDWR> for opening the file in read-write mode, and.

For historical reasons, some values work on almost every system
supported by perl: zero means read-only, one means write-only, and two
means read/write.  We know that these values do I<not> work under
OS/390 & VM/ESA Unix and on the Macintosh; you probably don't want to
use them in new code.

If the file named by FILENAME does not exist and the C<open> call creates
it (typically because MODE includes the C<O_CREAT> flag), then the value of
PERMS specifies the permissions of the newly created file.  If you omit
the PERMS argument to C<sysopen>, Perl uses the octal value C<0666>.
These permission values need to be in octal, and are modified by your
process's current C<umask>.

In many systems the C<O_EXCL> flag is available for opening files in
exclusive mode.  This is B<not> locking: exclusiveness means here that
if the file already exists, sysopen() fails.  The C<O_EXCL> wins
C<O_TRUNC>.

Sometimes you may want to truncate an already-existing file: C<O_TRUNC>.

You should seldom if ever use C<0644> as argument to C<sysopen>, because
that takes away the user's option to have a more permissive umask.
Better to omit it.  See the perlfunc(1) entry on C<umask> for more
on this.

Note that C<sysopen> depends on the fdopen() C library function.
On many UNIX systems, fdopen() is known to fail when file descriptors
exceed a certain value, typically 255. If you need more file
descriptors than that, consider rebuilding Perl to use the C<sfio>
library, or perhaps using the POSIX::open() function.

See L<perlopentut> for a kinder, gentler explanation of opening files.

=item sysread FILEHANDLE,SCALAR,LENGTH,OFFSET

=item sysread FILEHANDLE,SCALAR,LENGTH

Attempts to read LENGTH bytes of data into variable SCALAR from the
specified FILEHANDLE, using the system call read(2).  It bypasses stdio,
so mixing this with other kinds of reads, C<print>, C<write>,
C<seek>, C<tell>, or C<eof> can cause confusion because stdio
usually buffers data.  Returns the number of bytes actually read, C<0>
at end of file, or undef if there was an error.  SCALAR will be grown or
shrunk so that the last byte actually read is the last byte of the
scalar after the read.

An OFFSET may be specified to place the read data at some place in the
string other than the beginning.  A negative OFFSET specifies
placement at that many bytes counting backwards from the end of the
string.  A positive OFFSET greater than the length of SCALAR results
in the string being padded to the required size with C<"\0"> bytes before
the result of the read is appended.

There is no syseof() function, which is ok, since eof() doesn't work
very well on device files (like ttys) anyway.  Use sysread() and check
for a return value for 0 to decide whether you're done.

=item sysseek FILEHANDLE,POSITION,WHENCE

Sets FILEHANDLE's system position using the system call lseek(2).  It
bypasses stdio, so mixing this with reads (other than C<sysread>),
C<print>, C<write>, C<seek>, C<tell>, or C<eof> may cause confusion.
FILEHANDLE may be an expression whose value gives the name of the
filehandle.  The values for WHENCE are C<0> to set the new position to
POSITION, C<1> to set the it to the current position plus POSITION,
and C<2> to set it to EOF plus POSITION (typically negative).  For
WHENCE, you may also use the constants C<SEEK_SET>, C<SEEK_CUR>, and
C<SEEK_END> (start of the file, current position, end of the file)
from the Fcntl module.

Returns the new position, or the undefined value on failure.  A position
of zero is returned as the string C<"0 but true">; thus C<sysseek> returns
true on success and false on failure, yet you can still easily determine
the new position.

=item system LIST

=item system PROGRAM LIST

Does exactly the same thing as C<exec LIST>, except that a fork is
done first, and the parent process waits for the child process to
complete.  Note that argument processing varies depending on the
number of arguments.  If there is more than one argument in LIST,
or if LIST is an array with more than one value, starts the program
given by the first element of the list with arguments given by the
rest of the list.  If there is only one scalar argument, the argument
is checked for shell metacharacters, and if there are any, the
entire argument is passed to the system's command shell for parsing
(this is C</bin/sh -c> on Unix platforms, but varies on other
platforms).  If there are no shell metacharacters in the argument,
it is split into words and passed directly to C<execvp>, which is
more efficient.

Beginning with v5.6.0, Perl will attempt to flush all files opened for
output before any operation that may do a fork, but this may not be
supported on some platforms (see L<perlport>).  To be safe, you may need
to set C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method
of C<IO::Handle> on any open handles.

The return value is the exit status of the program as
returned by the C<wait> call.  To get the actual exit value divide by
256.  See also L</exec>.  This is I<not> what you want to use to capture
the output from a command, for that you should use merely backticks or
C<qx//>, as described in L<perlop/"`STRING`">.  Return value of -1
indicates a failure to start the program (inspect $! for the reason).

Like C<exec>, C<system> allows you to lie to a program about its name if
you use the C<system PROGRAM LIST> syntax.  Again, see L</exec>.

Because C<system> and backticks block C<SIGINT> and C<SIGQUIT>, killing the
program they're running doesn't actually interrupt your program.

    @args = ("command", "arg1", "arg2");
    system(@args) == 0
	 or die "system @args failed: $?"

You can check all the failure possibilities by inspecting
C<$?> like this:

    $exit_value  = $? >> 8;
    $signal_num  = $? & 127;
    $dumped_core = $? & 128;

When the arguments get executed via the system shell, results
and return codes will be subject to its quirks and capabilities.
See L<perlop/"`STRING`"> and L</exec> for details.

=item syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET

=item syswrite FILEHANDLE,SCALAR,LENGTH

=item syswrite FILEHANDLE,SCALAR

Attempts to write LENGTH bytes of data from variable SCALAR to the
specified FILEHANDLE, using the system call write(2).  If LENGTH
is not specified, writes whole SCALAR.  It bypasses stdio, so mixing
this with reads (other than C<sysread())>, C<print>, C<write>,
C<seek>, C<tell>, or C<eof> may cause confusion because stdio
usually buffers data.  Returns the number of bytes actually written,
or C<undef> if there was an error.  If the LENGTH is greater than
the available data in the SCALAR after the OFFSET, only as much
data as is available will be written.

An OFFSET may be specified to write the data from some part of the
string other than the beginning.  A negative OFFSET specifies writing
that many bytes counting backwards from the end of the string.  In the
case the SCALAR is empty you can use OFFSET but only zero offset.

=item tell FILEHANDLE

=item tell

Returns the current position for FILEHANDLE, or -1 on error.  FILEHANDLE
may be an expression whose value gives the name of the actual filehandle.
If FILEHANDLE is omitted, assumes the file last read.  

The return value of tell() for the standard streams like the STDIN
depends on the operating system: it may return -1 or something else.
tell() on pipes, fifos, and sockets usually returns -1.

There is no C<systell> function.  Use C<sysseek(FH, 0, 1)> for that.

=item telldir DIRHANDLE

Returns the current position of the C<readdir> routines on DIRHANDLE.
Value may be given to C<seekdir> to access a particular location in a
directory.  Has the same caveats about possible directory compaction as
the corresponding system library routine.

=item tie VARIABLE,CLASSNAME,LIST

This function binds a variable to a package class that will provide the
implementation for the variable.  VARIABLE is the name of the variable
to be enchanted.  CLASSNAME is the name of a class implementing objects
of correct type.  Any additional arguments are passed to the C<new>
method of the class (meaning C<TIESCALAR>, C<TIEHANDLE>, C<TIEARRAY>,
or C<TIEHASH>).  Typically these are arguments such as might be passed
to the C<dbm_open()> function of C.  The object returned by the C<new>
method is also returned by the C<tie> function, which would be useful
if you want to access other methods in CLASSNAME.

Note that functions such as C<keys> and C<values> may return huge lists
when used on large objects, like DBM files.  You may prefer to use the
C<each> function to iterate over such.  Example:

    # print out history file offsets
    use NDBM_File;
    tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0);
    while (($key,$val) = each %HIST) {
	print $key, ' = ', unpack('L',$val), "\n";
    }
    untie(%HIST);

A class implementing a hash should have the following methods:

    TIEHASH classname, LIST
    FETCH this, key
    STORE this, key, value
    DELETE this, key
    CLEAR this
    EXISTS this, key
    FIRSTKEY this
    NEXTKEY this, lastkey
    DESTROY this
    UNTIE this

A class implementing an ordinary array should have the following methods:

    TIEARRAY classname, LIST
    FETCH this, key
    STORE this, key, value
    FETCHSIZE this
    STORESIZE this, count
    CLEAR this
    PUSH this, LIST
    POP this
    SHIFT this
    UNSHIFT this, LIST
    SPLICE this, offset, length, LIST
    EXTEND this, count
    DESTROY this
    UNTIE this

A class implementing a file handle should have the following methods:

    TIEHANDLE classname, LIST
    READ this, scalar, length, offset
    READLINE this
    GETC this
    WRITE this, scalar, length, offset
    PRINT this, LIST
    PRINTF this, format, LIST
    BINMODE this
    EOF this
    FILENO this
    SEEK this, position, whence
    TELL this
    OPEN this, mode, LIST
    CLOSE this
    DESTROY this
    UNTIE this

A class implementing a scalar should have the following methods:

    TIESCALAR classname, LIST
    FETCH this,
    STORE this, value
    DESTROY this
    UNTIE this

Not all methods indicated above need be implemented.  See L<perltie>,
L<Tie::Hash>, L<Tie::Array>, L<Tie::Scalar>, and L<Tie::Handle>.

Unlike C<dbmopen>, the C<tie> function will not use or require a module
for you--you need to do that explicitly yourself.  See L<DB_File>
or the F<Config> module for interesting C<tie> implementations.

For further details see L<perltie>, L<"tied VARIABLE">.

=item tied VARIABLE

Returns a reference to the object underlying VARIABLE (the same value
that was originally returned by the C<tie> call that bound the variable
to a package.)  Returns the undefined value if VARIABLE isn't tied to a
package.

=item time

Returns the number of non-leap seconds since whatever time the system
considers to be the epoch (that's 00:00:00, January 1, 1904 for MacOS,
and 00:00:00 UTC, January 1, 1970 for most other systems).
Suitable for feeding to C<gmtime> and C<localtime>.

For measuring time in better granularity than one second,
you may use either the Time::HiRes module from CPAN, or
if you have gettimeofday(2), you may be able to use the
C<syscall> interface of Perl, see L<perlfaq8> for details.

=item times

Returns a four-element list giving the user and system times, in
seconds, for this process and the children of this process.

    ($user,$system,$cuser,$csystem) = times;

=item tr///

The transliteration operator.  Same as C<y///>.  See L<perlop>.

=item truncate FILEHANDLE,LENGTH

=item truncate EXPR,LENGTH

Truncates the file opened on FILEHANDLE, or named by EXPR, to the
specified length.  Produces a fatal error if truncate isn't implemented
on your system.  Returns true if successful, the undefined value
otherwise.

=item uc EXPR

=item uc

Returns an uppercased version of EXPR.  This is the internal function
implementing the C<\U> escape in double-quoted strings.
Respects current LC_CTYPE locale if C<use locale> in force.  See L<perllocale>.
Under Unicode (C<use utf8>) it uses the standard Unicode uppercase mappings.  (It
does not attempt to do titlecase mapping on initial letters.  See C<ucfirst> for that.)

If EXPR is omitted, uses C<$_>.

=item ucfirst EXPR

=item ucfirst

Returns the value of EXPR with the first character
in uppercase (titlecase in Unicode).  This is
the internal function implementing the C<\u> escape in double-quoted strings.
Respects current LC_CTYPE locale if C<use locale> in force.  See L<perllocale>
and L<utf8>.

If EXPR is omitted, uses C<$_>.

=item umask EXPR

=item umask

Sets the umask for the process to EXPR and returns the previous value.
If EXPR is omitted, merely returns the current umask.

The Unix permission C<rwxr-x---> is represented as three sets of three
bits, or three octal digits: C<0750> (the leading 0 indicates octal
and isn't one of the digits).  The C<umask> value is such a number
representing disabled permissions bits.  The permission (or "mode")
values you pass C<mkdir> or C<sysopen> are modified by your umask, so
even if you tell C<sysopen> to create a file with permissions C<0777>,
if your umask is C<0022> then the file will actually be created with
permissions C<0755>.  If your C<umask> were C<0027> (group can't
write; others can't read, write, or execute), then passing
C<sysopen> C<0666> would create a file with mode C<0640> (C<0666 &~
027> is C<0640>).

Here's some advice: supply a creation mode of C<0666> for regular
files (in C<sysopen>) and one of C<0777> for directories (in
C<mkdir>) and executable files.  This gives users the freedom of
choice: if they want protected files, they might choose process umasks
of C<022>, C<027>, or even the particularly antisocial mask of C<077>.
Programs should rarely if ever make policy decisions better left to
the user.  The exception to this is when writing files that should be
kept private: mail files, web browser cookies, I<.rhosts> files, and
so on.

If umask(2) is not implemented on your system and you are trying to
restrict access for I<yourself> (i.e., (EXPR & 0700) > 0), produces a
fatal error at run time.  If umask(2) is not implemented and you are
not trying to restrict access for yourself, returns C<undef>.

Remember that a umask is a number, usually given in octal; it is I<not> a
string of octal digits.  See also L</oct>, if all you have is a string.

=item undef EXPR

=item undef

Undefines the value of EXPR, which must be an lvalue.  Use only on a
scalar value, an array (using C<@>), a hash (using C<%>), a subroutine
(using C<&>), or a typeglob (using <*>).  (Saying C<undef $hash{$key}>
will probably not do what you expect on most predefined variables or
DBM list values, so don't do that; see L<delete>.)  Always returns the
undefined value.  You can omit the EXPR, in which case nothing is
undefined, but you still get an undefined value that you could, for
instance, return from a subroutine, assign to a variable or pass as a
parameter.  Examples:

    undef $foo;
    undef $bar{'blurfl'};      # Compare to: delete $bar{'blurfl'};
    undef @ary;
    undef %hash;
    undef &mysub;
    undef *xyz;       # destroys $xyz, @xyz, %xyz, &xyz, etc.
    return (wantarray ? (undef, $errmsg) : undef) if $they_blew_it;
    select undef, undef, undef, 0.25;
    ($a, $b, undef, $c) = &foo;       # Ignore third value returned

Note that this is a unary operator, not a list operator.

=item unlink LIST

=item unlink

Deletes a list of files.  Returns the number of files successfully
deleted.

    $cnt = unlink 'a', 'b', 'c';
    unlink @goners;
    unlink <*.bak>;

Note: C<unlink> will not delete directories unless you are superuser and
the B<-U> flag is supplied to Perl.  Even if these conditions are
met, be warned that unlinking a directory can inflict damage on your
filesystem.  Use C<rmdir> instead.

If LIST is omitted, uses C<$_>.

=item unpack TEMPLATE,EXPR

C<unpack> does the reverse of C<pack>: it takes a string
and expands it out into a list of values.
(In scalar context, it returns merely the first value produced.)

The string is broken into chunks described by the TEMPLATE.  Each chunk
is converted separately to a value.  Typically, either the string is a result
of C<pack>, or the bytes of the string represent a C structure of some
kind.

The TEMPLATE has the same format as in the C<pack> function.
Here's a subroutine that does substring:

    sub substr {
	my($what,$where,$howmuch) = @_;
	unpack("x$where a$howmuch", $what);
    }

and then there's

    sub ordinal { unpack("c",$_[0]); } # same as ord()

In addition to fields allowed in pack(), you may prefix a field with
a %<number> to indicate that
you want a <number>-bit checksum of the items instead of the items
themselves.  Default is a 16-bit checksum.  Checksum is calculated by
summing numeric values of expanded values (for string fields the sum of
C<ord($char)> is taken, for bit fields the sum of zeroes and ones).

For example, the following
computes the same number as the System V sum program:

    $checksum = do {
	local $/;  # slurp!
	unpack("%32C*",<>) % 65535;
    };

The following efficiently counts the number of set bits in a bit vector:

    $setbits = unpack("%32b*", $selectmask);

The C<p> and C<P> formats should be used with care.  Since Perl
has no way of checking whether the value passed to C<unpack()>
corresponds to a valid memory location, passing a pointer value that's
not known to be valid is likely to have disastrous consequences.

If the repeat count of a field is larger than what the remainder of
the input string allows, repeat count is decreased.  If the input string
is longer than one described by the TEMPLATE, the rest is ignored. 

See L</pack> for more examples and notes.

=item untie VARIABLE

Breaks the binding between a variable and a package.  (See C<tie>.)

=item unshift ARRAY,LIST

Does the opposite of a C<shift>.  Or the opposite of a C<push>,
depending on how you look at it.  Prepends list to the front of the
array, and returns the new number of elements in the array.

    unshift(ARGV, '-e') unless $ARGV[0] =~ /^-/;

Note the LIST is prepended whole, not one element at a time, so the
prepended elements stay in the same order.  Use C<reverse> to do the
reverse.

=item use Module VERSION LIST

=item use Module VERSION

=item use Module LIST

=item use Module

=item use VERSION

Imports some semantics into the current package from the named module,
generally by aliasing certain subroutine or variable names into your
package.  It is exactly equivalent to

    BEGIN { require Module; import Module LIST; }

except that Module I<must> be a bareword.

VERSION, which can be specified as a literal of the form v5.6.1, demands
that the current version of Perl (C<$^V> or $PERL_VERSION) be at least
as recent as that version.  (For compatibility with older versions of Perl,
a numeric literal will also be interpreted as VERSION.)  If the version
of the running Perl interpreter is less than VERSION, then an error
message is printed and Perl exits immediately without attempting to
parse the rest of the file.  Compare with L</require>, which can do a
similar check at run time.

    use v5.6.1;		# compile time version check
    use 5.6.1;		# ditto
    use 5.005_03;	# float version allowed for compatibility

This is often useful if you need to check the current Perl version before
C<use>ing library modules that have changed in incompatible ways from
older versions of Perl.  (We try not to do this more than we have to.)

The C<BEGIN> forces the C<require> and C<import> to happen at compile time.  The
C<require> makes sure the module is loaded into memory if it hasn't been
yet.  The C<import> is not a builtin--it's just an ordinary static method
call into the C<Module> package to tell the module to import the list of
features back into the current package.  The module can implement its
C<import> method any way it likes, though most modules just choose to
derive their C<import> method via inheritance from the C<Exporter> class that
is defined in the C<Exporter> module.  See L<Exporter>.  If no C<import>
method can be found then the call is skipped.

If you do not want to call the package's C<import> method (for instance,
to stop your namespace from being altered), explicitly supply the empty list:

    use Module ();

That is exactly equivalent to

    BEGIN { require Module }

If the VERSION argument is present between Module and LIST, then the
C<use> will call the VERSION method in class Module with the given
version as an argument.  The default VERSION method, inherited from
the UNIVERSAL class, croaks if the given version is larger than the
value of the variable C<$Module::VERSION>. 

Again, there is a distinction between omitting LIST (C<import> called
with no arguments) and an explicit empty LIST C<()> (C<import> not
called).  Note that there is no comma after VERSION!

Because this is a wide-open interface, pragmas (compiler directives)
are also implemented this way.  Currently implemented pragmas are:

    use constant;
    use diagnostics;
    use integer;
    use sigtrap  qw(SEGV BUS);
    use strict   qw(subs vars refs);
    use subs     qw(afunc blurfl);
    use warnings qw(all);

Some of these pseudo-modules import semantics into the current
block scope (like C<strict> or C<integer>, unlike ordinary modules,
which import symbols into the current package (which are effective
through the end of the file).

There's a corresponding C<no> command that unimports meanings imported
by C<use>, i.e., it calls C<unimport Module LIST> instead of C<import>.

    no integer;
    no strict 'refs';
    no warnings;

If no C<unimport> method can be found the call fails with a fatal error.

See L<perlmodlib> for a list of standard modules and pragmas.  See L<perlrun>
for the C<-M> and C<-m> command-line options to perl that give C<use>
functionality from the command-line.

=item utime LIST

Changes the access and modification times on each file of a list of
files.  The first two elements of the list must be the NUMERICAL access
and modification times, in that order.  Returns the number of files
successfully changed.  The inode change time of each file is set
to the current time.  This code has the same effect as the C<touch>
command if the files already exist:

    #!/usr/bin/perl
    $now = time;
    utime $now, $now, @ARGV;

=item values HASH

Returns a list consisting of all the values of the named hash.  (In a
scalar context, returns the number of values.)  The values are
returned in an apparently random order.  The actual random order is
subject to change in future versions of perl, but it is guaranteed to
be the same order as either the C<keys> or C<each> function would
produce on the same (unmodified) hash.

Note that the values are not copied, which means modifying them will
modify the contents of the hash:

    for (values %hash) 	    { s/foo/bar/g }   # modifies %hash values
    for (@hash{keys %hash}) { s/foo/bar/g }   # same

As a side effect, calling values() resets the HASH's internal iterator.
See also C<keys>, C<each>, and C<sort>.

=item vec EXPR,OFFSET,BITS

Treats the string in EXPR as a bit vector made up of elements of
width BITS, and returns the value of the element specified by OFFSET
as an unsigned integer.  BITS therefore specifies the number of bits
that are reserved for each element in the bit vector.  This must
be a power of two from 1 to 32 (or 64, if your platform supports
that).

If BITS is 8, "elements" coincide with bytes of the input string.  

If BITS is 16 or more, bytes of the input string are grouped into chunks
of size BITS/8, and each group is converted to a number as with
pack()/unpack() with big-endian formats C<n>/C<N> (and analogously
for BITS==64).  See L<"pack"> for details.

If bits is 4 or less, the string is broken into bytes, then the bits
of each byte are broken into 8/BITS groups.  Bits of a byte are
numbered in a little-endian-ish way, as in C<0x01>, C<0x02>,
C<0x04>, C<0x08>, C<0x10>, C<0x20>, C<0x40>, C<0x80>.  For example,
breaking the single input byte C<chr(0x36)> into two groups gives a list
C<(0x6, 0x3)>; breaking it into 4 groups gives C<(0x2, 0x1, 0x3, 0x0)>.

C<vec> may also be assigned to, in which case parentheses are needed
to give the expression the correct precedence as in

    vec($image, $max_x * $x + $y, 8) = 3;

If the selected element is outside the string, the value 0 is returned.
If an element off the end of the string is written to, Perl will first
extend the string with sufficiently many zero bytes.   It is an error
to try to write off the beginning of the string (i.e. negative OFFSET).

The string should not contain any character with the value > 255 (which
can only happen if you're using UTF8 encoding).  If it does, it will be
treated as something which is not UTF8 encoded.  When the C<vec> was
assigned to, other parts of your program will also no longer consider the
string to be UTF8 encoded.  In other words, if you do have such characters
in your string, vec() will operate on the actual byte string, and not the
conceptual character string.

Strings created with C<vec> can also be manipulated with the logical
operators C<|>, C<&>, C<^>, and C<~>.  These operators will assume a bit
vector operation is desired when both operands are strings.
See L<perlop/"Bitwise String Operators">.

The following code will build up an ASCII string saying C<'PerlPerlPerl'>.
The comments show the string after each step.  Note that this code works
in the same way on big-endian or little-endian machines.

    my $foo = '';
    vec($foo,  0, 32) = 0x5065726C;	# 'Perl'

    # $foo eq "Perl" eq "\x50\x65\x72\x6C", 32 bits
    print vec($foo, 0, 8);		# prints 80 == 0x50 == ord('P')

    vec($foo,  2, 16) = 0x5065;		# 'PerlPe'
    vec($foo,  3, 16) = 0x726C;		# 'PerlPerl'
    vec($foo,  8,  8) = 0x50;		# 'PerlPerlP'
    vec($foo,  9,  8) = 0x65;		# 'PerlPerlPe'
    vec($foo, 20,  4) = 2;		# 'PerlPerlPe'   . "\x02"
    vec($foo, 21,  4) = 7;		# 'PerlPerlPer'
                                        # 'r' is "\x72"
    vec($foo, 45,  2) = 3;		# 'PerlPerlPer'  . "\x0c"
    vec($foo, 93,  1) = 1;		# 'PerlPerlPer'  . "\x2c"
    vec($foo, 94,  1) = 1;		# 'PerlPerlPerl'
                                        # 'l' is "\x6c"

To transform a bit vector into a string or list of 0's and 1's, use these:

    $bits = unpack("b*", $vector);
    @bits = split(//, unpack("b*", $vector));

If you know the exact length in bits, it can be used in place of the C<*>.

Here is an example to illustrate how the bits actually fall in place:

    #!/usr/bin/perl -wl

    print <<'EOT';
                                      0         1         2         3  
                       unpack("V",$_) 01234567890123456789012345678901
    ------------------------------------------------------------------
    EOT

    for $w (0..3) {
        $width = 2**$w;
        for ($shift=0; $shift < $width; ++$shift) {
            for ($off=0; $off < 32/$width; ++$off) {
                $str = pack("B*", "0"x32);
                $bits = (1<<$shift);
                vec($str, $off, $width) = $bits;
                $res = unpack("b*",$str);
                $val = unpack("V", $str);
                write;
            }
        }
    }

    format STDOUT =
    vec($_,@#,@#) = @<< == @######### @>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    $off, $width, $bits, $val, $res
    .
    __END__

Regardless of the machine architecture on which it is run, the above
example should print the following table:

                                      0         1         2         3  
                       unpack("V",$_) 01234567890123456789012345678901
    ------------------------------------------------------------------
    vec($_, 0, 1) = 1   ==          1 10000000000000000000000000000000
    vec($_, 1, 1) = 1   ==          2 01000000000000000000000000000000
    vec($_, 2, 1) = 1   ==          4 00100000000000000000000000000000
    vec($_, 3, 1) = 1   ==          8 00010000000000000000000000000000
    vec($_, 4, 1) = 1   ==         16 00001000000000000000000000000000
    vec($_, 5, 1) = 1   ==         32 00000100000000000000000000000000
    vec($_, 6, 1) = 1   ==         64 00000010000000000000000000000000
    vec($_, 7, 1) = 1   ==        128 00000001000000000000000000000000
    vec($_, 8, 1) = 1   ==        256 00000000100000000000000000000000
    vec($_, 9, 1) = 1   ==        512 00000000010000000000000000000000
    vec($_,10, 1) = 1   ==       1024 00000000001000000000000000000000
    vec($_,11, 1) = 1   ==       2048 00000000000100000000000000000000
    vec($_,12, 1) = 1   ==       4096 00000000000010000000000000000000
    vec($_,13, 1) = 1   ==       8192 00000000000001000000000000000000
    vec($_,14, 1) = 1   ==      16384 00000000000000100000000000000000
    vec($_,15, 1) = 1   ==      32768 00000000000000010000000000000000
    vec($_,16, 1) = 1   ==      65536 00000000000000001000000000000000
    vec($_,17, 1) = 1   ==     131072 00000000000000000100000000000000
    vec($_,18, 1) = 1   ==     262144 00000000000000000010000000000000
    vec($_,19, 1) = 1   ==     524288 00000000000000000001000000000000
    vec($_,20, 1) = 1   ==    1048576 00000000000000000000100000000000
    vec($_,21, 1) = 1   ==    2097152 00000000000000000000010000000000
    vec($_,22, 1) = 1   ==    4194304 00000000000000000000001000000000
    vec($_,23, 1) = 1   ==    8388608 00000000000000000000000100000000
    vec($_,24, 1) = 1   ==   16777216 00000000000000000000000010000000
    vec($_,25, 1) = 1   ==   33554432 00000000000000000000000001000000
    vec($_,26, 1) = 1   ==   67108864 00000000000000000000000000100000
    vec($_,27, 1) = 1   ==  134217728 00000000000000000000000000010000
    vec($_,28, 1) = 1   ==  268435456 00000000000000000000000000001000
    vec($_,29, 1) = 1   ==  536870912 00000000000000000000000000000100
    vec($_,30, 1) = 1   == 1073741824 00000000000000000000000000000010
    vec($_,31, 1) = 1   == 2147483648 00000000000000000000000000000001
    vec($_, 0, 2) = 1   ==          1 10000000000000000000000000000000
    vec($_, 1, 2) = 1   ==          4 00100000000000000000000000000000
    vec($_, 2, 2) = 1   ==         16 00001000000000000000000000000000
    vec($_, 3, 2) = 1   ==         64 00000010000000000000000000000000
    vec($_, 4, 2) = 1   ==        256 00000000100000000000000000000000
    vec($_, 5, 2) = 1   ==       1024 00000000001000000000000000000000
    vec($_, 6, 2) = 1   ==       4096 00000000000010000000000000000000
    vec($_, 7, 2) = 1   ==      16384 00000000000000100000000000000000
    vec($_, 8, 2) = 1   ==      65536 00000000000000001000000000000000
    vec($_, 9, 2) = 1   ==     262144 00000000000000000010000000000000
    vec($_,10, 2) = 1   ==    1048576 00000000000000000000100000000000
    vec($_,11, 2) = 1   ==    4194304 00000000000000000000001000000000
    vec($_,12, 2) = 1   ==   16777216 00000000000000000000000010000000
    vec($_,13, 2) = 1   ==   67108864 00000000000000000000000000100000
    vec($_,14, 2) = 1   ==  268435456 00000000000000000000000000001000
    vec($_,15, 2) = 1   == 1073741824 00000000000000000000000000000010
    vec($_, 0, 2) = 2   ==          2 01000000000000000000000000000000
    vec($_, 1, 2) = 2   ==          8 00010000000000000000000000000000
    vec($_, 2, 2) = 2   ==         32 00000100000000000000000000000000
    vec($_, 3, 2) = 2   ==        128 00000001000000000000000000000000
    vec($_, 4, 2) = 2   ==        512 00000000010000000000000000000000
    vec($_, 5, 2) = 2   ==       2048 00000000000100000000000000000000
    vec($_, 6, 2) = 2   ==       8192 00000000000001000000000000000000
    vec($_, 7, 2) = 2   ==      32768 00000000000000010000000000000000
    vec($_, 8, 2) = 2   ==     131072 00000000000000000100000000000000
    vec($_, 9, 2) = 2   ==     524288 00000000000000000001000000000000
    vec($_,10, 2) = 2   ==    2097152 00000000000000000000010000000000
    vec($_,11, 2) = 2   ==    8388608 00000000000000000000000100000000
    vec($_,12, 2) = 2   ==   33554432 00000000000000000000000001000000
    vec($_,13, 2) = 2   ==  134217728 00000000000000000000000000010000
    vec($_,14, 2) = 2   ==  536870912 00000000000000000000000000000100
    vec($_,15, 2) = 2   == 2147483648 00000000000000000000000000000001
    vec($_, 0, 4) = 1   ==          1 10000000000000000000000000000000
    vec($_, 1, 4) = 1   ==         16 00001000000000000000000000000000
    vec($_, 2, 4) = 1   ==        256 00000000100000000000000000000000
    vec($_, 3, 4) = 1   ==       4096 00000000000010000000000000000000
    vec($_, 4, 4) = 1   ==      65536 00000000000000001000000000000000
    vec($_, 5, 4) = 1   ==    1048576 00000000000000000000100000000000
    vec($_, 6, 4) = 1   ==   16777216 00000000000000000000000010000000
    vec($_, 7, 4) = 1   ==  268435456 00000000000000000000000000001000
    vec($_, 0, 4) = 2   ==          2 01000000000000000000000000000000
    vec($_, 1, 4) = 2   ==         32 00000100000000000000000000000000
    vec($_, 2, 4) = 2   ==        512 00000000010000000000000000000000
    vec($_, 3, 4) = 2   ==       8192 00000000000001000000000000000000
    vec($_, 4, 4) = 2   ==     131072 00000000000000000100000000000000
    vec($_, 5, 4) = 2   ==    2097152 00000000000000000000010000000000
    vec($_, 6, 4) = 2   ==   33554432 00000000000000000000000001000000
    vec($_, 7, 4) = 2   ==  536870912 00000000000000000000000000000100
    vec($_, 0, 4) = 4   ==          4 00100000000000000000000000000000
    vec($_, 1, 4) = 4   ==         64 00000010000000000000000000000000
    vec($_, 2, 4) = 4   ==       1024 00000000001000000000000000000000
    vec($_, 3, 4) = 4   ==      16384 00000000000000100000000000000000
    vec($_, 4, 4) = 4   ==     262144 00000000000000000010000000000000
    vec($_, 5, 4) = 4   ==    4194304 00000000000000000000001000000000
    vec($_, 6, 4) = 4   ==   67108864 00000000000000000000000000100000
    vec($_, 7, 4) = 4   == 1073741824 00000000000000000000000000000010
    vec($_, 0, 4) = 8   ==          8 00010000000000000000000000000000
    vec($_, 1, 4) = 8   ==        128 00000001000000000000000000000000
    vec($_, 2, 4) = 8   ==       2048 00000000000100000000000000000000
    vec($_, 3, 4) = 8   ==      32768 00000000000000010000000000000000
    vec($_, 4, 4) = 8   ==     524288 00000000000000000001000000000000
    vec($_, 5, 4) = 8   ==    8388608 00000000000000000000000100000000
    vec($_, 6, 4) = 8   ==  134217728 00000000000000000000000000010000
    vec($_, 7, 4) = 8   == 2147483648 00000000000000000000000000000001
    vec($_, 0, 8) = 1   ==          1 10000000000000000000000000000000
    vec($_, 1, 8) = 1   ==        256 00000000100000000000000000000000
    vec($_, 2, 8) = 1   ==      65536 00000000000000001000000000000000
    vec($_, 3, 8) = 1   ==   16777216 00000000000000000000000010000000
    vec($_, 0, 8) = 2   ==          2 01000000000000000000000000000000
    vec($_, 1, 8) = 2   ==        512 00000000010000000000000000000000
    vec($_, 2, 8) = 2   ==     131072 00000000000000000100000000000000
    vec($_, 3, 8) = 2   ==   33554432 00000000000000000000000001000000
    vec($_, 0, 8) = 4   ==          4 00100000000000000000000000000000
    vec($_, 1, 8) = 4   ==       1024 00000000001000000000000000000000
    vec($_, 2, 8) = 4   ==     262144 00000000000000000010000000000000
    vec($_, 3, 8) = 4   ==   67108864 00000000000000000000000000100000
    vec($_, 0, 8) = 8   ==          8 00010000000000000000000000000000
    vec($_, 1, 8) = 8   ==       2048 00000000000100000000000000000000
    vec($_, 2, 8) = 8   ==     524288 00000000000000000001000000000000
    vec($_, 3, 8) = 8   ==  134217728 00000000000000000000000000010000
    vec($_, 0, 8) = 16  ==         16 00001000000000000000000000000000
    vec($_, 1, 8) = 16  ==       4096 00000000000010000000000000000000
    vec($_, 2, 8) = 16  ==    1048576 00000000000000000000100000000000
    vec($_, 3, 8) = 16  ==  268435456 00000000000000000000000000001000
    vec($_, 0, 8) = 32  ==         32 00000100000000000000000000000000
    vec($_, 1, 8) = 32  ==       8192 00000000000001000000000000000000
    vec($_, 2, 8) = 32  ==    2097152 00000000000000000000010000000000
    vec($_, 3, 8) = 32  ==  536870912 00000000000000000000000000000100
    vec($_, 0, 8) = 64  ==         64 00000010000000000000000000000000
    vec($_, 1, 8) = 64  ==      16384 00000000000000100000000000000000
    vec($_, 2, 8) = 64  ==    4194304 00000000000000000000001000000000
    vec($_, 3, 8) = 64  == 1073741824 00000000000000000000000000000010
    vec($_, 0, 8) = 128 ==        128 00000001000000000000000000000000
    vec($_, 1, 8) = 128 ==      32768 00000000000000010000000000000000
    vec($_, 2, 8) = 128 ==    8388608 00000000000000000000000100000000
    vec($_, 3, 8) = 128 == 2147483648 00000000000000000000000000000001

=item wait

Behaves like the wait(2) system call on your system: it waits for a child
process to terminate and returns the pid of the deceased process, or
C<-1> if there are no child processes.  The status is returned in C<$?>.
Note that a return value of C<-1> could mean that child processes are
being automatically reaped, as described in L<perlipc>.

=item waitpid PID,FLAGS

Waits for a particular child process to terminate and returns the pid of
the deceased process, or C<-1> if there is no such child process.  On some
systems, a value of 0 indicates that there are processes still running.
The status is returned in C<$?>.  If you say

    use POSIX ":sys_wait_h";
    #...
    do { 
	$kid = waitpid(-1,&WNOHANG);
    } until $kid == -1;

then you can do a non-blocking wait for all pending zombie processes.
Non-blocking wait is available on machines supporting either the
waitpid(2) or wait4(2) system calls.  However, waiting for a particular
pid with FLAGS of C<0> is implemented everywhere.  (Perl emulates the
system call by remembering the status values of processes that have
exited but have not been harvested by the Perl script yet.)

Note that on some systems, a return value of C<-1> could mean that child
processes are being automatically reaped.  See L<perlipc> for details,
and for other examples.

=item wantarray

Returns true if the context of the currently executing subroutine is
looking for a list value.  Returns false if the context is looking
for a scalar.  Returns the undefined value if the context is looking
for no value (void context).

    return unless defined wantarray;	# don't bother doing more
    my @a = complex_calculation();
    return wantarray ? @a : "@a";

This function should have been named wantlist() instead.

=item warn LIST

Produces a message on STDERR just like C<die>, but doesn't exit or throw
an exception.

If LIST is empty and C<$@> already contains a value (typically from a
previous eval) that value is used after appending C<"\t...caught">
to C<$@>.  This is useful for staying almost, but not entirely similar to
C<die>.

If C<$@> is empty then the string C<"Warning: Something's wrong"> is used.

No message is printed if there is a C<$SIG{__WARN__}> handler
installed.  It is the handler's responsibility to deal with the message
as it sees fit (like, for instance, converting it into a C<die>).  Most
handlers must therefore make arrangements to actually display the
warnings that they are not prepared to deal with, by calling C<warn>
again in the handler.  Note that this is quite safe and will not
produce an endless loop, since C<__WARN__> hooks are not called from
inside one.

You will find this behavior is slightly different from that of
C<$SIG{__DIE__}> handlers (which don't suppress the error text, but can
instead call C<die> again to change it).

Using a C<__WARN__> handler provides a powerful way to silence all
warnings (even the so-called mandatory ones).  An example:

    # wipe out *all* compile-time warnings
    BEGIN { $SIG{'__WARN__'} = sub { warn $_[0] if $DOWARN } }
    my $foo = 10;
    my $foo = 20;          # no warning about duplicate my $foo,
                           # but hey, you asked for it!
    # no compile-time or run-time warnings before here
    $DOWARN = 1;

    # run-time warnings enabled after here
    warn "\$foo is alive and $foo!";     # does show up

See L<perlvar> for details on setting C<%SIG> entries, and for more
examples.  See the Carp module for other kinds of warnings using its
carp() and cluck() functions.

=item write FILEHANDLE

=item write EXPR

=item write

Writes a formatted record (possibly multi-line) to the specified FILEHANDLE,
using the format associated with that file.  By default the format for
a file is the one having the same name as the filehandle, but the
format for the current output channel (see the C<select> function) may be set
explicitly by assigning the name of the format to the C<$~> variable.

Top of form processing is handled automatically:  if there is
insufficient room on the current page for the formatted record, the
page is advanced by writing a form feed, a special top-of-page format
is used to format the new page header, and then the record is written.
By default the top-of-page format is the name of the filehandle with
"_TOP" appended, but it may be dynamically set to the format of your
choice by assigning the name to the C<$^> variable while the filehandle is
selected.  The number of lines remaining on the current page is in
variable C<$->, which can be set to C<0> to force a new page.

If FILEHANDLE is unspecified, output goes to the current default output
channel, which starts out as STDOUT but may be changed by the
C<select> operator.  If the FILEHANDLE is an EXPR, then the expression
is evaluated and the resulting string is used to look up the name of
the FILEHANDLE at run time.  For more on formats, see L<perlform>.

Note that write is I<not> the opposite of C<read>.  Unfortunately.

=item y///

The transliteration operator.  Same as C<tr///>.  See L<perlop>.

=back
LAGS(gv)7Usage: B::IO::LINES(io)5io is not a reference6Usage: B::IO::PAGE(io):Usage: B                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlguts - Introduction to the Perl API

=head1 DESCRIPTION

This document attempts to describe how to use the Perl API, as well as
containing some info on the basic workings of the Perl core. It is far
from complete and probably contains many errors. Please refer any
questions or comments to the author below.

=head1 Variables

=head2 Datatypes

Perl has three typedefs that handle Perl's three main data types:

    SV  Scalar Value
    AV  Array Value
    HV  Hash Value

Each typedef has specific routines that manipulate the various data types.

=head2 What is an "IV"?

Perl uses a special typedef IV which is a simple signed integer type that is
guaranteed to be large enough to hold a pointer (as well as an integer).
Additionally, there is the UV, which is simply an unsigned IV.

Perl also uses two special typedefs, I32 and I16, which will always be at
least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
as well.)

=head2 Working with SVs

An SV can be created and loaded with one command.  There are four types of
values that can be loaded: an integer value (IV), a double (NV),
a string (PV), and another scalar (SV).

The six routines are:

    SV*  newSViv(IV);
    SV*  newSVnv(double);
    SV*  newSVpv(const char*, int);
    SV*  newSVpvn(const char*, int);
    SV*  newSVpvf(const char*, ...);
    SV*  newSVsv(SV*);

To change the value of an *already-existing* SV, there are seven routines:

    void  sv_setiv(SV*, IV);
    void  sv_setuv(SV*, UV);
    void  sv_setnv(SV*, double);
    void  sv_setpv(SV*, const char*);
    void  sv_setpvn(SV*, const char*, int)
    void  sv_setpvf(SV*, const char*, ...);
    void  sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
    void  sv_setsv(SV*, SV*);

Notice that you can choose to specify the length of the string to be
assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
allow Perl to calculate the length by using C<sv_setpv> or by specifying
0 as the second argument to C<newSVpv>.  Be warned, though, that Perl will
determine the string's length by using C<strlen>, which depends on the
string terminating with a NUL character.

The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
formatted output becomes the value.

C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify
either a pointer to a variable argument list or the address and length of
an array of SVs.  The last argument points to a boolean; on return, if that
boolean is true, then locale-specific information has been used to format
the string, and the string's contents are therefore untrustworthy (see
L<perlsec>).  This pointer may be NULL if that information is not
important.  Note that this function requires you to specify the length of
the format.

STRLEN is an integer type (Size_t, usually defined as size_t in
config.h) guaranteed to be large enough to represent the size of 
any string that perl can handle.

The C<sv_set*()> functions are not generic enough to operate on values
that have "magic".  See L<Magic Virtual Tables> later in this document.

All SVs that contain strings should be terminated with a NUL character.
If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string to C
functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string stored
in an SV to a C function or system call.

To access the actual value that an SV points to, you can use the macros:

    SvIV(SV*)
    SvUV(SV*)
    SvNV(SV*)
    SvPV(SV*, STRLEN len)
    SvPV_nolen(SV*)

which will automatically coerce the actual scalar type into an IV, UV, double,
or string.

In the C<SvPV> macro, the length of the string returned is placed into the
variable C<len> (this is a macro, so you do I<not> use C<&len>).  If you do
not care what the length of the data is, use the C<SvPV_nolen> macro.
Historically the C<SvPV> macro with the global variable C<PL_na> has been
used in this case.  But that can be quite inefficient because C<PL_na> must
be accessed in thread-local storage in threaded Perl.  In any case, remember
that Perl allows arbitrary strings of data that may both contain NULs and
might not be terminated by a NUL.

Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
len);>. It might work with your compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:

	SV *s;
	STRLEN len;
	char * ptr;
	ptr = SvPV(s, len);
	foo(ptr, len);

If you want to know if the scalar value is TRUE, you can use:

    SvTRUE(SV*)

Although Perl will automatically grow strings for you, if you need to force
Perl to allocate more memory for your SV, you can use the macro

    SvGROW(SV*, STRLEN newlen)

which will determine if more memory needs to be allocated.  If so, it will
call the function C<sv_grow>.  Note that C<SvGROW> can only increase, not
decrease, the allocated memory of an SV and that it does not automatically
add a byte for the a trailing NUL (perl's own string functions typically do
C<SvGROW(sv, len + 1)>).

If you have an SV and want to know what kind of data Perl thinks is stored
in it, you can use the following macros to check the type of SV you have.

    SvIOK(SV*)
    SvNOK(SV*)
    SvPOK(SV*)

You can get and set the current length of the string stored in an SV with
the following macros:

    SvCUR(SV*)
    SvCUR_set(SV*, I32 val)

You can also get a pointer to the end of the string stored in the SV
with the macro:

    SvEND(SV*)

But note that these last three macros are valid only if C<SvPOK()> is true.

If you want to append something to the end of string stored in an C<SV*>,
you can use the following functions:

    void  sv_catpv(SV*, const char*);
    void  sv_catpvn(SV*, const char*, STRLEN);
    void  sv_catpvf(SV*, const char*, ...);
    void  sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
    void  sv_catsv(SV*, SV*);

The first function calculates the length of the string to be appended by
using C<strlen>.  In the second, you specify the length of the string
yourself.  The third function processes its arguments like C<sprintf> and
appends the formatted output.  The fourth function works like C<vsprintf>.
You can specify the address and length of an array of SVs instead of the
va_list argument. The fifth function extends the string stored in the first
SV with the string stored in the second SV.  It also forces the second SV
to be interpreted as a string.

The C<sv_cat*()> functions are not generic enough to operate on values that
have "magic".  See L<Magic Virtual Tables> later in this document.

If you know the name of a scalar variable, you can get a pointer to its SV
by using the following:

    SV*  get_sv("package::varname", FALSE);

This returns NULL if the variable does not exist.

If you want to know if this variable (or any other SV) is actually C<defined>,
you can call:

    SvOK(SV*)

The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.  Its
address can be used whenever an C<SV*> is needed.

There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean
TRUE and FALSE values, respectively.  Like C<PL_sv_undef>, their addresses can
be used whenever an C<SV*> is needed.

Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
Take this code:

    SV* sv = (SV*) 0;
    if (I-am-to-return-a-real-value) {
            sv = sv_2mortal(newSViv(42));
    }
    sv_setsv(ST(0), sv);

This code tries to return a new SV (which contains the value 42) if it should
return a real value, or undef otherwise.  Instead it has returned a NULL
pointer which, somewhere down the line, will cause a segmentation violation,
bus error, or just weird results.  Change the zero to C<&PL_sv_undef> in the first
line and all will be well.

To free an SV that you've created, call C<SvREFCNT_dec(SV*)>.  Normally this
call is not necessary (see L<Reference Counts and Mortality>).

=head2 Offsets

Perl provides the function C<sv_chop> to efficiently remove characters
from the beginning of a string; you give it an SV and a pointer to
somewhere inside the the PV, and it discards everything before the
pointer. The efficiency comes by means of a little hack: instead of
actually removing the characters, C<sv_chop> sets the flag C<OOK>
(offset OK) to signal to other functions that the offset hack is in
effect, and it puts the number of bytes chopped off into the IV field
of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
many bytes, and adjusts C<SvCUR> and C<SvLEN>. 

Hence, at this point, the start of the buffer that we allocated lives
at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
into the middle of this allocated storage.

This is best demonstrated by example:

  % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
  SV = PVIV(0x8128450) at 0x81340f0
    REFCNT = 1
    FLAGS = (POK,OOK,pPOK)
    IV = 1  (OFFSET)
    PV = 0x8135781 ( "1" . ) "2345"\0
    CUR = 4
    LEN = 5

Here the number of bytes chopped off (1) is put into IV, and
C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
portion of the string between the "real" and the "fake" beginnings is
shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
the fake beginning, not the real one.

Something similar to the offset hack is perfomed on AVs to enable
efficient shifting and splicing off the beginning of the array; while
C<AvARRAY> points to the first element in the array that is visible from
Perl, C<AvALLOC> points to the real start of the C array. These are
usually the same, but a C<shift> operation can be carried out by
increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
Again, the location of the real start of the C array only comes into
play when freeing the array. See C<av_shift> in F<av.c>.

=head2 What's Really Stored in an SV?

Recall that the usual method of determining the type of scalar you have is
to use C<Sv*OK> macros.  Because a scalar can be both a number and a string,
usually these macros will always return TRUE and calling the C<Sv*V>
macros will do the appropriate conversion of string to integer/double or
integer/double to string.

If you I<really> need to know if you have an integer, double, or string
pointer in an SV, you can use the following three macros instead:

    SvIOKp(SV*)
    SvNOKp(SV*)
    SvPOKp(SV*)

These will tell you if you truly have an integer, double, or string pointer
stored in your SV.  The "p" stands for private.

In general, though, it's best to use the C<Sv*V> macros.

=head2 Working with AVs

There are two ways to create and load an AV.  The first method creates an
empty AV:

    AV*  newAV();

The second method both creates the AV and initially populates it with SVs:

    AV*  av_make(I32 num, SV **ptr);

The second argument points to an array containing C<num> C<SV*>'s.  Once the
AV has been created, the SVs can be destroyed, if so desired.

Once the AV has been created, the following operations are possible on AVs:

    void  av_push(AV*, SV*);
    SV*   av_pop(AV*);
    SV*   av_shift(AV*);
    void  av_unshift(AV*, I32 num);

These should be familiar operations, with the exception of C<av_unshift>.
This routine adds C<num> elements at the front of the array with the C<undef>
value.  You must then use C<av_store> (described below) to assign values
to these new elements.

Here are some other functions:

    I32   av_len(AV*);
    SV**  av_fetch(AV*, I32 key, I32 lval);
    SV**  av_store(AV*, I32 key, SV* val);

The C<av_len> function returns the highest index value in array (just
like $#array in Perl).  If the array is empty, -1 is returned.  The
C<av_fetch> function returns the value at index C<key>, but if C<lval>
is non-zero, then C<av_fetch> will store an undef value at that index.
The C<av_store> function stores the value C<val> at index C<key>, and does
not increment the reference count of C<val>.  Thus the caller is responsible
for taking care of that, and if C<av_store> returns NULL, the caller will
have to decrement the reference count to avoid a memory leak.  Note that
C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
return value.

    void  av_clear(AV*);
    void  av_undef(AV*);
    void  av_extend(AV*, I32 key);

The C<av_clear> function deletes all the elements in the AV* array, but
does not actually delete the array itself.  The C<av_undef> function will
delete all the elements in the array plus the array itself.  The
C<av_extend> function extends the array so that it contains at least C<key+1>
elements.  If C<key+1> is less than the currently allocated length of the array,
then nothing is done.

If you know the name of an array variable, you can get a pointer to its AV
by using the following:

    AV*  get_av("package::varname", FALSE);

This returns NULL if the variable does not exist.

See L<Understanding the Magic of Tied Hashes and Arrays> for more
information on how to use the array access functions on tied arrays.

=head2 Working with HVs

To create an HV, you use the following routine:

    HV*  newHV();

Once the HV has been created, the following operations are possible on HVs:

    SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
    SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);

The C<klen> parameter is the length of the key being passed in (Note that
you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
length of the key).  The C<val> argument contains the SV pointer to the
scalar being stored, and C<hash> is the precomputed hash value (zero if
you want C<hv_store> to calculate it for you).  The C<lval> parameter
indicates whether this fetch is actually a part of a store operation, in
which case a new undefined value will be added to the HV with the supplied
key and C<hv_fetch> will return as if the value had already existed.

Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
C<SV*>.  To access the scalar value, you must first dereference the return
value.  However, you should check to make sure that the return value is
not NULL before dereferencing it.

These two functions check if a hash table entry exists, and deletes it.

    bool  hv_exists(HV*, const char* key, U32 klen);
    SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);

If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
create and return a mortal copy of the deleted value.

And more miscellaneous functions:

    void   hv_clear(HV*);
    void   hv_undef(HV*);

Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
table but does not actually delete the hash table.  The C<hv_undef> deletes
both the entries and the hash table itself.

Perl keeps the actual data in linked list of structures with a typedef of HE.
These contain the actual key and value pointers (plus extra administrative
overhead).  The key is a string pointer; the value is an C<SV*>.  However,
once you have an C<HE*>, to get the actual key and value, use the routines
specified below.

    I32    hv_iterinit(HV*);
            /* Prepares starting point to traverse hash table */
    HE*    hv_iternext(HV*);
            /* Get the next entry, and return a pointer to a
               structure that has both the key and value */
    char*  hv_iterkey(HE* entry, I32* retlen);
            /* Get the key from an HE structure and also return
               the length of the key string */
    SV*    hv_iterval(HV*, HE* entry);
            /* Return a SV pointer to the value of the HE
               structure */
    SV*    hv_iternextsv(HV*, char** key, I32* retlen);
            /* This convenience routine combines hv_iternext,
	       hv_iterkey, and hv_iterval.  The key and retlen
	       arguments are return values for the key and its
	       length.  The value is returned in the SV* argument */

If you know the name of a hash variable, you can get a pointer to its HV
by using the following:

    HV*  get_hv("package::varname", FALSE);

This returns NULL if the variable does not exist.

The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:

    hash = 0;
    while (klen--)
	hash = (hash * 33) + *key++;
    hash = hash + (hash >> 5);			/* after 5.6 */

The last step was added in version 5.6 to improve distribution of
lower bits in the resulting hash value.

See L<Understanding the Magic of Tied Hashes and Arrays> for more
information on how to use the hash access functions on tied hashes.

=head2 Hash API Extensions

Beginning with version 5.004, the following functions are also supported:

    HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
    HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);

    bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
    SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);

    SV*     hv_iterkeysv  (HE* entry);

Note that these functions take C<SV*> keys, which simplifies writing
of extension code that deals with hash structures.  These functions
also allow passing of C<SV*> keys to C<tie> functions without forcing
you to stringify the keys (unlike the previous set of functions).

They also return and accept whole hash entries (C<HE*>), making their
use more efficient (since the hash number for a particular string
doesn't have to be recomputed every time).  See L<perlapi> for detailed
descriptions.

The following macros must always be used to access the contents of hash
entries.  Note that the arguments to these macros must be simple
variables, since they may get evaluated more than once.  See
L<perlapi> for detailed descriptions of these macros.

    HePV(HE* he, STRLEN len)
    HeVAL(HE* he)
    HeHASH(HE* he)
    HeSVKEY(HE* he)
    HeSVKEY_force(HE* he)
    HeSVKEY_set(HE* he, SV* sv)

These two lower level macros are defined, but must only be used when
dealing with keys that are not C<SV*>s:

    HeKEY(HE* he)
    HeKLEN(HE* he)

Note that both C<hv_store> and C<hv_store_ent> do not increment the
reference count of the stored C<val>, which is the caller's responsibility.
If these functions return a NULL value, the caller will usually have to
decrement the reference count of C<val> to avoid a memory leak.

=head2 References

References are a special type of scalar that point to other data types
(including references).

To create a reference, use either of the following functions:

    SV* newRV_inc((SV*) thing);
    SV* newRV_noinc((SV*) thing);

The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>.  The
functions are identical except that C<newRV_inc> increments the reference
count of the C<thing>, while C<newRV_noinc> does not.  For historical
reasons, C<newRV> is a synonym for C<newRV_inc>.

Once you have a reference, you can use the following macro to dereference
the reference:

    SvRV(SV*)

then call the appropriate routines, casting the returned C<SV*> to either an
C<AV*> or C<HV*>, if required.

To determine if an SV is a reference, you can use the following macro:

    SvROK(SV*)

To discover what type of value the reference refers to, use the following
macro and then check the return value.

    SvTYPE(SvRV(SV*))

The most useful types that will be returned are:

    SVt_IV    Scalar
    SVt_NV    Scalar
    SVt_PV    Scalar
    SVt_RV    Scalar
    SVt_PVAV  Array
    SVt_PVHV  Hash
    SVt_PVCV  Code
    SVt_PVGV  Glob (possible a file handle)
    SVt_PVMG  Blessed or Magical Scalar

    See the sv.h header file for more details.

=head2 Blessed References and Class Objects

References are also used to support object-oriented programming.  In the
OO lexicon, an object is simply a reference that has been blessed into a
package (or class).  Once blessed, the programmer may now use the reference
to access the various methods in the class.

A reference can be blessed into a package with the following function:

    SV* sv_bless(SV* sv, HV* stash);

The C<sv> argument must be a reference.  The C<stash> argument specifies
which class the reference will belong to.  See
L<Stashes and Globs> for information on converting class names into stashes.

/* Still under construction */

Upgrades rv to reference if not already one.  Creates new SV for rv to
point to.  If C<classname> is non-null, the SV is blessed into the specified
class.  SV is returned.

	SV* newSVrv(SV* rv, const char* classname);

Copies integer or double into an SV whose reference is C<rv>.  SV is blessed
if C<classname> is non-null.

	SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
	SV* sv_setref_nv(SV* rv, const char* classname, NV iv);

Copies the pointer value (I<the address, not the string!>) into an SV whose
reference is rv.  SV is blessed if C<classname> is non-null.

	SV* sv_setref_pv(SV* rv, const char* classname, PV iv);

Copies string into an SV whose reference is C<rv>.  Set length to 0 to let
Perl calculate the string length.  SV is blessed if C<classname> is non-null.

	SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);

Tests whether the SV is blessed into the specified class.  It does not
check inheritance relationships.

	int  sv_isa(SV* sv, const char* name);

Tests whether the SV is a reference to a blessed object.

	int  sv_isobject(SV* sv);

Tests whether the SV is derived from the specified class. SV can be either
a reference to a blessed object or a string containing a class name. This
is the function implementing the C<UNIVERSAL::isa> functionality.

	bool sv_derived_from(SV* sv, const char* name);

To check if you've got an object derived from a specific class you have 
to write:

	if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }

=head2 Creating New Variables

To create a new Perl variable with an undef value which can be accessed from
your Perl script, use the following routines, depending on the variable type.

    SV*  get_sv("package::varname", TRUE);
    AV*  get_av("package::varname", TRUE);
    HV*  get_hv("package::varname", TRUE);

Notice the use of TRUE as the second parameter.  The new variable can now
be set, using the routines appropriate to the data type.

There are additional macros whose values may be bitwise OR'ed with the
C<TRUE> argument to enable certain extra features.  Those bits are:

    GV_ADDMULTI	Marks the variable as multiply defined, thus preventing the
		"Name <varname> used only once: possible typo" warning.
    GV_ADDWARN	Issues the warning "Had to create <varname> unexpectedly" if
		the variable did not exist before the function was called.

If you do not specify a package name, the variable is created in the current
package.

=head2 Reference Counts and Mortality

Perl uses an reference count-driven garbage collection mechanism. SVs,
AVs, or HVs (xV for short in the following) start their life with a
reference count of 1.  If the reference count of an xV ever drops to 0,
then it will be destroyed and its memory made available for reuse.

This normally doesn't happen at the Perl level unless a variable is
undef'ed or the last variable holding a reference to it is changed or
overwritten.  At the internal level, however, reference counts can be
manipulated with the following macros:

    int SvREFCNT(SV* sv);
    SV* SvREFCNT_inc(SV* sv);
    void SvREFCNT_dec(SV* sv);

However, there is one other function which manipulates the reference
count of its argument.  The C<newRV_inc> function, you will recall,
creates a reference to the specified argument.  As a side effect,
it increments the argument's reference count.  If this is not what
you want, use C<newRV_noinc> instead.

For example, imagine you want to return a reference from an XSUB function.
Inside the XSUB routine, you create an SV which initially has a reference
count of one.  Then you call C<newRV_inc>, passing it the just-created SV.
This returns the reference as a new SV, but the reference count of the
SV you passed to C<newRV_inc> has been incremented to two.  Now you
return the reference from the XSUB routine and forget about the SV.
But Perl hasn't!  Whenever the returned reference is destroyed, the
reference count of the original SV is decreased to one and nothing happens.
The SV will hang around without any way to access it until Perl itself
terminates.  This is a memory leak.

The correct procedure, then, is to use C<newRV_noinc> instead of
C<newRV_inc>.  Then, if and when the last reference is destroyed,
the reference count of the SV will go to zero and it will be destroyed,
stopping any memory leak.

There are some convenience functions available that can help with the
destruction of xVs.  These functions introduce the concept of "mortality".
An xV that is mortal has had its reference count marked to be decremented,
but not actually decremented, until "a short time later".  Generally the
term "short time later" means a single Perl statement, such as a call to
an XSUB function.  The actual determinant for when mortal xVs have their
reference count decremented depends on two macros, SAVETMPS and FREETMPS.
See L<perlcall> and L<perlxs> for more details on these macros.

"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
However, if you mortalize a variable twice, the reference count will
later be decremented twice.

You should be careful about creating mortal variables.  Strange things
can happen if you make the same value mortal within multiple contexts,
or if you make a variable mortal multiple times.

To create a mortal variable, use the functions:

    SV*  sv_newmortal()
    SV*  sv_2mortal(SV*)
    SV*  sv_mortalcopy(SV*)

The first call creates a mortal SV, the second converts an existing
SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
third creates a mortal copy of an existing SV.

The mortal routines are not just for SVs -- AVs and HVs can be
made mortal by passing their address (type-casted to C<SV*>) to the
C<sv_2mortal> or C<sv_mortalcopy> routines.

=head2 Stashes and Globs

A "stash" is a hash that contains all of the different objects that
are contained within a package.  Each key of the stash is a symbol
name (shared by all the different types of objects that have the same
name), and each value in the hash table is a GV (Glob Value).  This GV
in turn contains references to the various objects of that name,
including (but not limited to) the following:

    Scalar Value
    Array Value
    Hash Value
    I/O Handle
    Format
    Subroutine

There is a single stash called "PL_defstash" that holds the items that exist
in the "main" package.  To get at the items in other packages, append the
string "::" to the package name.  The items in the "Foo" package are in
the stash "Foo::" in PL_defstash.  The items in the "Bar::Baz" package are
in the stash "Baz::" in "Bar::"'s stash.

To get the stash pointer for a particular package, use the function:

    HV*  gv_stashpv(const char* name, I32 create)
    HV*  gv_stashsv(SV*, I32 create)

The first function takes a literal string, the second uses the string stored
in the SV.  Remember that a stash is just a hash table, so you get back an
C<HV*>.  The C<create> flag will create a new package if it is set.

The name that C<gv_stash*v> wants is the name of the package whose symbol table
you want.  The default package is called C<main>.  If you have multiply nested
packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
language itself.

Alternately, if you have an SV that is a blessed reference, you can find
out the stash pointer by using:

    HV*  SvSTASH(SvRV(SV*));

then use the following to get the package name itself:

    char*  HvNAME(HV* stash);

If you need to bless or re-bless an object you can use the following
function:

    SV*  sv_bless(SV*, HV* stash)

where the first argument, an C<SV*>, must be a reference, and the second
argument is a stash.  The returned C<SV*> can now be used in the same way
as any other SV.

For more information on references and blessings, consult L<perlref>.

=head2 Double-Typed SVs

Scalar variables normally contain only one type of value, an integer,
double, pointer, or reference.  Perl will automatically convert the
actual scalar data from the stored type into the requested type.

Some scalar variables contain more than one type of scalar data.  For
example, the variable C<$!> contains either the numeric value of C<errno>
or its string equivalent from either C<strerror> or C<sys_errlist[]>.

To force multiple data values into an SV, you must do two things: use the
C<sv_set*v> routines to add the additional scalar type, then set a flag
so that Perl will believe it contains more than one type of data.  The
four macros to set the flags are:

	SvIOK_on
	SvNOK_on
	SvPOK_on
	SvROK_on

The particular macro you must use depends on which C<sv_set*v> routine
you called first.  This is because every C<sv_set*v> routine turns on
only the bit for the particular type of data being set, and turns off
all the rest.

For example, to create a new Perl variable called "dberror" that contains
both the numeric and descriptive string error values, you could use the
following code:

    extern int  dberror;
    extern char *dberror_list;

    SV* sv = get_sv("dberror", TRUE);
    sv_setiv(sv, (IV) dberror);
    sv_setpv(sv, dberror_list[dberror]);
    SvIOK_on(sv);

If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.

=head2 Magic Variables

[This section still under construction.  Ignore everything here.  Post no
bills.  Everything not permitted is forbidden.]

Any SV may be magical, that is, it has special features that a normal
SV does not have.  These features are stored in the SV structure in a
linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.

    struct magic {
        MAGIC*      mg_moremagic;
        MGVTBL*     mg_virtual;
        U16         mg_private;
        char        mg_type;
        U8          mg_flags;
        SV*         mg_obj;
        char*       mg_ptr;
        I32         mg_len;
    };

Note this is current as of patchlevel 0, and could change at any time.

=head2 Assigning Magic

Perl adds magic to an SV using the sv_magic function:

    void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);

The C<sv> argument is a pointer to the SV that is to acquire a new magical
feature.

If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
set the C<SVt_PVMG> flag for the C<sv>.  Perl then continues by adding
it to the beginning of the linked list of magical features.  Any prior
entry of the same type of magic is deleted.  Note that this can be
overridden, and multiple instances of the same type of magic can be
associated with an SV.

The C<name> and C<namlen> arguments are used to associate a string with
the magic, typically the name of a variable. C<namlen> is stored in the
C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd
copy of the name is stored in C<mg_ptr> field.

The sv_magic function uses C<how> to determine which, if any, predefined
"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
See the "Magic Virtual Table" section below.  The C<how> argument is also
stored in the C<mg_type> field.

The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
structure.  If it is not the same as the C<sv> argument, the reference
count of the C<obj> object is incremented.  If it is the same, or if
the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is
merely stored, without the reference count being incremented.

There is also a function to add magic to an C<HV>:

    void hv_magic(HV *hv, GV *gv, int how);

This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.

To remove the magic from an SV, call the function sv_unmagic:

    void sv_unmagic(SV *sv, int type);

The C<type> argument should be equal to the C<how> value when the C<SV>
was initially made magical.

=head2 Magic Virtual Tables

The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a
C<MGVTBL>, which is a structure of function pointers and stands for
"Magic Virtual Table" to handle the various operations that might be
applied to that variable.

The C<MGVTBL> has five pointers to the following routine types:

    int  (*svt_get)(SV* sv, MAGIC* mg);
    int  (*svt_set)(SV* sv, MAGIC* mg);
    U32  (*svt_len)(SV* sv, MAGIC* mg);
    int  (*svt_clear)(SV* sv, MAGIC* mg);
    int  (*svt_free)(SV* sv, MAGIC* mg);

This MGVTBL structure is set at compile-time in C<perl.h> and there are
currently 19 types (or 21 with overloading turned on).  These different
structures contain pointers to various routines that perform additional
actions depending on which function is being called.

    Function pointer    Action taken
    ----------------    ------------
    svt_get             Do something after the value of the SV is retrieved.
    svt_set             Do something after the SV is assigned a value.
    svt_len             Report on the SV's length.
    svt_clear		Clear something the SV represents.
    svt_free            Free any extra storage associated with the SV.

For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
to an C<mg_type> of '\0') contains:

    { magic_get, magic_set, magic_len, 0, 0 }

Thus, when an SV is determined to be magical and of type '\0', if a get
operation is being performed, the routine C<magic_get> is called.  All
the various routines for the various magical types begin with C<magic_>.
NOTE: the magic routines are not considered part of the Perl API, and may
not be exported by the Perl library.

The current kinds of Magic Virtual Tables are:

    mg_type  MGVTBL              Type of magic
    -------  ------              ----------------------------
    \0       vtbl_sv             Special scalar variable
    A        vtbl_amagic         %OVERLOAD hash
    a        vtbl_amagicelem     %OVERLOAD hash element
    c        (none)              Holds overload table (AMT) on stash
    B        vtbl_bm             Boyer-Moore (fast string search)
    D        vtbl_regdata        Regex match position data (@+ and @- vars)
    d        vtbl_regdatum       Regex match position data element
    E        vtbl_env            %ENV hash
    e        vtbl_envelem        %ENV hash element
    f        vtbl_fm             Formline ('compiled' format)
    g        vtbl_mglob          m//g target / study()ed string
    I        vtbl_isa            @ISA array
    i        vtbl_isaelem        @ISA array element
    k        vtbl_nkeys          scalar(keys()) lvalue
    L        (none)              Debugger %_<filename 
    l        vtbl_dbline         Debugger %_<filename element
    o        vtbl_collxfrm       Locale transformation
    P        vtbl_pack           Tied array or hash
    p        vtbl_packelem       Tied array or hash element
    q        vtbl_packelem       Tied scalar or handle
    S        vtbl_sig            %SIG hash
    s        vtbl_sigelem        %SIG hash element
    t        vtbl_taint          Taintedness
    U        vtbl_uvar           Available for use by extensions
    v        vtbl_vec            vec() lvalue
    x        vtbl_substr         substr() lvalue
    y        vtbl_defelem        Shadow "foreach" iterator variable /
                                  smart parameter vivification
    *        vtbl_glob           GV (typeglob)
    #        vtbl_arylen         Array length ($#ary)
    .        vtbl_pos            pos() lvalue
    ~        (none)              Available for use by extensions

When an uppercase and lowercase letter both exist in the table, then the
uppercase letter is used to represent some kind of composite type (a list
or a hash), and the lowercase letter is used to represent an element of
that composite type.

The '~' and 'U' magic types are defined specifically for use by
extensions and will not be used by perl itself.  Extensions can use
'~' magic to 'attach' private information to variables (typically
objects).  This is especially useful because there is no way for
normal perl code to corrupt this private information (unlike using
extra elements of a hash object).

Similarly, 'U' magic can be used much like tie() to call a C function
any time a scalar's value is used or changed.  The C<MAGIC>'s
C<mg_ptr> field points to a C<ufuncs> structure:

    struct ufuncs {
        I32 (*uf_val)(IV, SV*);
        I32 (*uf_set)(IV, SV*);
        IV uf_index;
    };

When the SV is read from or written to, the C<uf_val> or C<uf_set>
function will be called with C<uf_index> as the first arg and a
pointer to the SV as the second.  A simple example of how to add 'U'
magic is shown below.  Note that the ufuncs structure is copied by
sv_magic, so you can safely allocate it on the stack.

    void
    Umagic(sv)
        SV *sv;
    PREINIT:
        struct ufuncs uf;
    CODE:
        uf.uf_val   = &my_get_fn;
        uf.uf_set   = &my_set_fn;
        uf.uf_index = 0;
        sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf));

Note that because multiple extensions may be using '~' or 'U' magic,
it is important for extensions to take extra care to avoid conflict.
Typically only using the magic on objects blessed into the same class
as the extension is sufficient.  For '~' magic, it may also be
appropriate to add an I32 'signature' at the top of the private data
area and check that.

Also note that the C<sv_set*()> and C<sv_cat*()> functions described
earlier do B<not> invoke 'set' magic on their targets.  This must
be done by the user either by calling the C<SvSETMAGIC()> macro after
calling these functions, or by using one of the C<sv_set*_mg()> or
C<sv_cat*_mg()> functions.  Similarly, generic C code must call the
C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
obtained from external sources in functions that don't handle magic.
See L<perlapi> for a description of these functions.
For example, calls to the C<sv_cat*()> functions typically need to be
followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
since their implementation handles 'get' magic.

=head2 Finding Magic

    MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */

This routine returns a pointer to the C<MAGIC> structure stored in the SV.
If the SV does not have that magical feature, C<NULL> is returned.  Also,
if the SV is not of type SVt_PVMG, Perl may core dump.

    int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);

This routine checks to see what types of magic C<sv> has.  If the mg_type
field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
the mg_type field is changed to be the lowercase letter.

=head2 Understanding the Magic of Tied Hashes and Arrays

Tied hashes and arrays are magical beasts of the 'P' magic type.

WARNING: As of the 5.004 release, proper usage of the array and hash
access functions requires understanding a few caveats.  Some
of these caveats are actually considered bugs in the API, to be fixed
in later releases, and are bracketed with [MAYCHANGE] below. If
you find yourself actually applying such information in this section, be
aware that the behavior may change in the future, umm, without warning.

The perl tie function associates a variable with an object that implements
the various GET, SET etc methods.  To perform the equivalent of the perl
tie function from an XSUB, you must mimic this behaviour.  The code below
carries out the necessary steps - firstly it creates a new hash, and then
creates a second hash which it blesses into the class which will implement
the tie methods. Lastly it ties the two hashes together, and returns a
reference to the new tied hash.  Note that the code below does NOT call the
TIEHASH method in the MyTie class -
see L<Calling Perl Routines from within C Programs> for details on how
to do this.

    SV*
    mytie()
    PREINIT:
        HV *hash;
        HV *stash;
        SV *tie;
    CODE:
        hash = newHV();
        tie = newRV_noinc((SV*)newHV());
        stash = gv_stashpv("MyTie", TRUE);
        sv_bless(tie, stash);
        hv_magic(hash, tie, 'P');
        RETVAL = newRV_noinc(hash);
    OUTPUT:
        RETVAL

The C<av_store> function, when given a tied array argument, merely
copies the magic of the array onto the value to be "stored", using
C<mg_copy>.  It may also return NULL, indicating that the value did not
actually need to be stored in the array.  [MAYCHANGE] After a call to
C<av_store> on a tied array, the caller will usually need to call
C<mg_set(val)> to actually invoke the perl level "STORE" method on the
TIEARRAY object.  If C<av_store> did return NULL, a call to
C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
leak. [/MAYCHANGE]

The previous paragraph is applicable verbatim to tied hash access using the
C<hv_store> and C<hv_store_ent> functions as well.

C<av_fetch> and the corresponding hash functions C<hv_fetch> and
C<hv_fetch_ent> actually return an undefined mortal value whose magic
has been initialized using C<mg_copy>.  Note the value so returned does not
need to be deallocated, as it is already mortal.  [MAYCHANGE] But you will
need to call C<mg_get()> on the returned value in order to actually invoke
the perl level "FETCH" method on the underlying TIE object.  Similarly,
you may also call C<mg_set()> on the return value after possibly assigning
a suitable value to it using C<sv_setsv>,  which will invoke the "STORE"
method on the TIE object. [/MAYCHANGE]

[MAYCHANGE]
In other words, the array or hash fetch/store functions don't really
fetch and store actual values in the case of tied arrays and hashes.  They
merely call C<mg_copy> to attach magic to the values that were meant to be
"stored" or "fetched".  Later calls to C<mg_get> and C<mg_set> actually
do the job of invoking the TIE methods on the underlying objects.  Thus
the magic mechanism currently implements a kind of lazy access to arrays
and hashes.

Currently (as of perl version 5.004), use of the hash and array access
functions requires the user to be aware of whether they are operating on
"normal" hashes and arrays, or on their tied variants.  The API may be
changed to provide more transparent access to both tied and normal data
types in future versions.
[/MAYCHANGE]

You would do well to understand that the TIEARRAY and TIEHASH interfaces
are mere sugar to invoke some perl method calls while using the uniform hash
and array syntax.  The use of this sugar imposes some overhead (typically
about two to four extra opcodes per FETCH/STORE operation, in addition to
the creation of all the mortal variables required to invoke the methods).
This overhead will be comparatively small if the TIE methods are themselves
substantial, but if they are only a few statements long, the overhead
will not be insignificant.

=head2 Localizing changes

Perl has a very handy construction

  {
    local $var = 2;
    ...
  }

This construction is I<approximately> equivalent to

  {
    my $oldvar = $var;
    $var = 2;
    ...
    $var = $oldvar;
  }

The biggest difference is that the first construction would
reinstate the initial value of $var, irrespective of how control exits
the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit
more efficient as well.

There is a way to achieve a similar task from C via Perl API: create a
I<pseudo-block>, and arrange for some changes to be automatically
undone at the end of it, either explicit, or via a non-local exit (via
die()). A I<block>-like construct is created by a pair of
C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
Such a construct may be created specially for some important localized
task, or an existing one (like boundaries of enclosing Perl
subroutine/block, or an existing pair for freeing TMPs) may be
used. (In the second case the overhead of additional localization must
be almost negligible.) Note that any XSUB is automatically enclosed in
an C<ENTER>/C<LEAVE> pair.

Inside such a I<pseudo-block> the following service is available:

=over 4

=item C<SAVEINT(int i)>

=item C<SAVEIV(IV i)>

=item C<SAVEI32(I32 i)>

=item C<SAVELONG(long i)>

These macros arrange things to restore the value of integer variable
C<i> at the end of enclosing I<pseudo-block>.

=item C<SAVESPTR(s)>

=item C<SAVEPPTR(p)>

These macros arrange things to restore the value of pointers C<s> and
C<p>. C<s> must be a pointer of a type which survives conversion to
C<SV*> and back, C<p> should be able to survive conversion to C<char*>
and back.

=item C<SAVEFREESV(SV *sv)>

The refcount of C<sv> would be decremented at the end of
I<pseudo-block>.  This is similar to C<sv_2mortal> in that it is also a
mechanism for doing a delayed C<SvREFCNT_dec>.  However, while C<sv_2mortal>
extends the lifetime of C<sv> until the beginning of the next statement,
C<SAVEFREESV> extends it until the end of the enclosing scope.  These
lifetimes can be wildly different.

Also compare C<SAVEMORTALIZESV>.

=item C<SAVEMORTALIZESV(SV *sv)>

Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
scope instead of decrementing its reference count.  This usually has the
effect of keeping C<sv> alive until the statement that called the currently
live scope has finished executing.

=item C<SAVEFREEOP(OP *op)>

The C<OP *> is op_free()ed at the end of I<pseudo-block>.

=item C<SAVEFREEPV(p)>

The chunk of memory which is pointed to by C<p> is Safefree()ed at the
end of I<pseudo-block>.

=item C<SAVECLEARSV(SV *sv)>

Clears a slot in the current scratchpad which corresponds to C<sv> at
the end of I<pseudo-block>.

=item C<SAVEDELETE(HV *hv, char *key, I32 length)>

The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
string pointed to by C<key> is Safefree()ed.  If one has a I<key> in
short-lived storage, the corresponding string may be reallocated like
this:

  SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));

=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>

At the end of I<pseudo-block> the function C<f> is called with the
only argument C<p>.

=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>

At the end of I<pseudo-block> the function C<f> is called with the
implicit context argument (if any), and C<p>.

=item C<SAVESTACK_POS()>

The current offset on the Perl internal stack (cf. C<SP>) is restored
at the end of I<pseudo-block>.

=back

The following API list contains functions, thus one needs to
provide pointers to the modifiable data explicitly (either C pointers,
or Perlish C<GV *>s).  Where the above macros take C<int>, a similar 
function takes C<int *>.

=over 4

=item C<SV* save_scalar(GV *gv)>

Equivalent to Perl code C<local $gv>.

=item C<AV* save_ary(GV *gv)>

=item C<HV* save_hash(GV *gv)>

Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.

=item C<void save_item(SV *item)>

Duplicates the current value of C<SV>, on the exit from the current
C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
using the stored value.

=item C<void save_list(SV **sarg, I32 maxsarg)>

A variant of C<save_item> which takes multiple arguments via an array
C<sarg> of C<SV*> of length C<maxsarg>.

=item C<SV* save_svref(SV **sptr)>

Similar to C<save_scalar>, but will reinstate a C<SV *>.

=item C<void save_aptr(AV **aptr)>

=item C<void save_hptr(HV **hptr)>

Similar to C<save_svref>, but localize C<AV *> and C<HV *>.

=back

The C<Alias> module implements localization of the basic types within the
I<caller's scope>.  People who are interested in how to localize things in
the containing scope should take a look there too.

=head1 Subroutines

=head2 XSUBs and the Argument Stack

The XSUB mechanism is a simple way for Perl programs to access C subroutines.
An XSUB routine will have a stack that contains the arguments from the Perl
program, and a way to map from the Perl data structures to a C equivalent.

The stack arguments are accessible through the C<ST(n)> macro, which returns
the C<n>'th stack argument.  Argument 0 is the first argument passed in the
Perl subroutine call.  These arguments are C<SV*>, and can be used anywhere
an C<SV*> is used.

Most of the time, output from the C routine can be handled through use of
the RETVAL and OUTPUT directives.  However, there are some cases where the
argument stack is not already long enough to handle all the return values.
An example is the POSIX tzname() call, which takes no arguments, but returns
two, the local time zone's standard and summer time abbreviations.

To handle this situation, the PPCODE directive is used and the stack is
extended using the macro:

    EXTEND(SP, num);

where C<SP> is the macro that represents the local copy of the stack pointer,
and C<num> is the number of elements the stack should be extended by.

Now that there is room on the stack, values can be pushed on it using the
macros to push IVs, doubles, strings, and SV pointers respectively:

    PUSHi(IV)
    PUSHn(double)
    PUSHp(char*, I32)
    PUSHs(SV*)

And now the Perl program calling C<tzname>, the two values will be assigned
as in:

    ($standard_abbrev, $summer_abbrev) = POSIX::tzname;

An alternate (and possibly simpler) method to pushing values on the stack is
to use the macros:

    XPUSHi(IV)
    XPUSHn(double)
    XPUSHp(char*, I32)
    XPUSHs(SV*)

These macros automatically adjust the stack for you, if needed.  Thus, you
do not need to call C<EXTEND> to extend the stack.
However, see L</Putting a C value on Perl stack>

For more information, consult L<perlxs> and L<perlxstut>.

=head2 Calling Perl Routines from within C Programs

There are four routines that can be used to call a Perl subroutine from
within a C program.  These four are:

    I32  call_sv(SV*, I32);
    I32  call_pv(const char*, I32);
    I32  call_method(const char*, I32);
    I32  call_argv(const char*, I32, register char**);

The routine most often used is C<call_sv>.  The C<SV*> argument
contains either the name of the Perl subroutine to be called, or a
reference to the subroutine.  The second argument consists of flags
that control the context in which the subroutine is called, whether
or not the subroutine is being passed arguments, how errors should be
trapped, and how to treat return values.

All four routines return the number of arguments that the subroutine returned
on the Perl stack.

These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0,
but those names are now deprecated; macros of the same name are provided for
compatibility.

When using any of these routines (except C<call_argv>), the programmer
must manipulate the Perl stack.  These include the following macros and
functions:

    dSP
    SP
    PUSHMARK()
    PUTBACK
    SPAGAIN
    ENTER
    SAVETMPS
    FREETMPS
    LEAVE
    XPUSH*()
    POP*()

For a detailed description of calling conventions from C to Perl,
consult L<perlcall>.

=head2 Memory Allocation

All memory meant to be used with the Perl API functions should be manipulated
using the macros described in this section.  The macros provide the necessary
transparency between differences in the actual malloc implementation that is
used within perl.

It is suggested that you enable the version of malloc that is distributed
with Perl.  It keeps pools of various sizes of unallocated memory in
order to satisfy allocation requests more quickly.  However, on some
platforms, it may cause spurious malloc or free errors.

    New(x, pointer, number, type);
    Newc(x, pointer, number, type, cast);
    Newz(x, pointer, number, type);

These three macros are used to initially allocate memory.

The first argument C<x> was a "magic cookie" that was used to keep track
of who called the macro, to help when debugging memory problems.  However,
the current code makes no use of this feature (most Perl developers now
use run-time memory checkers), so this argument can be any number.

The second argument C<pointer> should be the name of a variable that will
point to the newly allocated memory.

The third and fourth arguments C<number> and C<type> specify how many of
the specified type of data structure should be allocated.  The argument
C<type> is passed to C<sizeof>.  The final argument to C<Newc>, C<cast>,
should be used if the C<pointer> argument is different from the C<type>
argument.

Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
to zero out all the newly allocated memory.

    Renew(pointer, number, type);
    Renewc(pointer, number, type, cast);
    Safefree(pointer)

These three macros are used to change a memory buffer size or to free a
piece of memory no longer needed.  The arguments to C<Renew> and C<Renewc>
match those of C<New> and C<Newc> with the exception of not needing the
"magic cookie" argument.

    Move(source, dest, number, type);
    Copy(source, dest, number, type);
    Zero(dest, number, type);

These three macros are used to move, copy, or zero out previously allocated
memory.  The C<source> and C<dest> arguments point to the source and
destination starting points.  Perl will move, copy, or zero out C<number>
instances of the size of the C<type> data structure (using the C<sizeof>
function).

=head2 PerlIO

The most recent development releases of Perl has been experimenting with
removing Perl's dependency on the "normal" standard I/O suite and allowing
other stdio implementations to be used.  This involves creating a new
abstraction layer that then calls whichever implementation of stdio Perl
was compiled with.  All XSUBs should now use the functions in the PerlIO
abstraction layer and not make any assumptions about what kind of stdio
is being used.

For a complete description of the PerlIO abstraction, consult L<perlapio>.

=head2 Putting a C value on Perl stack

A lot of opcodes (this is an elementary operation in the internal perl
stack machine) put an SV* on the stack. However, as an optimization
the corresponding SV is (usually) not recreated each time. The opcodes
reuse specially assigned SVs (I<target>s) which are (as a corollary)
not constantly freed/created.

Each of the targets is created only once (but see
L<Scratchpads and recursion> below), and when an opcode needs to put
an integer, a double, or a string on stack, it just sets the
corresponding parts of its I<target> and puts the I<target> on stack.

The macro to put this target on stack is C<PUSHTARG>, and it is
directly used in some opcodes, as well as indirectly in zillions of
others, which use it via C<(X)PUSH[pni]>.

Because the target is reused, you must be careful when pushing multiple
values on the stack. The following code will not do what you think:

    XPUSHi(10);
    XPUSHi(20);

This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
At the end of the operation, the stack does not contain the values 10
and 20, but actually contains two pointers to C<TARG>, which we have set
to 20. If you need to push multiple different values, use C<XPUSHs>,
which bypasses C<TARG>.

On a related note, if you do use C<(X)PUSH[npi]>, then you're going to
need a C<dTARG> in your variable declarations so that the C<*PUSH*>
macros can make use of the local variable C<TARG>. 

=head2 Scratchpads

The question remains on when the SVs which are I<target>s for opcodes
are created. The answer is that they are created when the current unit --
a subroutine or a file (for opcodes for statements outside of
subroutines) -- is compiled. During this time a special anonymous Perl
array is created, which is called a scratchpad for the current
unit.

A scratchpad keeps SVs which are lexicals for the current unit and are
targets for opcodes. One can deduce that an SV lives on a scratchpad
by looking on its flags: lexicals have C<SVs_PADMY> set, and
I<target>s have C<SVs_PADTMP> set.

The correspondence between OPs and I<target>s is not 1-to-1. Different
OPs in the compile tree of the unit can use the same target, if this
would not conflict with the expected life of the temporary.

=head2 Scratchpads and recursion

In fact it is not 100% true that a compiled unit contains a pointer to
the scratchpad AV. In fact it contains a pointer to an AV of
(initially) one element, and this element is the scratchpad AV. Why do
we need an extra level of indirection?

The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both
these can create several execution pointers going into the same
subroutine. For the subroutine-child not write over the temporaries
for the subroutine-parent (lifespan of which covers the call to the
child), the parent and the child should have different
scratchpads. (I<And> the lexicals should be separate anyway!)

So each subroutine is born with an array of scratchpads (of length 1).
On each entry to the subroutine it is checked that the current
depth of the recursion is not more than the length of this array, and
if it is, new scratchpad is created and pushed into the array.

The I<target>s on this scratchpad are C<undef>s, but they are already
marked with correct flags.

=head1 Compiled code

=head2 Code tree

Here we describe the internal form your code is converted to by
Perl. Start with a simple example:

  $a = $b + $c;

This is converted to a tree similar to this one:

             assign-to
           /           \
          +             $a
        /   \
      $b     $c

(but slightly more complicated).  This tree reflects the way Perl
parsed your code, but has nothing to do with the execution order.
There is an additional "thread" going through the nodes of the tree
which shows the order of execution of the nodes.  In our simplified
example above it looks like:

     $b ---> $c ---> + ---> $a ---> assign-to

But with the actual compile tree for C<$a = $b + $c> it is different:
some nodes I<optimized away>.  As a corollary, though the actual tree
contains more nodes than our simplified example, the execution order
is the same as in our example.

=head2 Examining the tree

If you have your perl compiled for debugging (usually done with C<-D
optimize=-g> on C<Configure> command line), you may examine the
compiled tree by specifying C<-Dx> on the Perl command line.  The
output takes several lines per node, and for C<$b+$c> it looks like
this:

    5           TYPE = add  ===> 6
                TARG = 1
                FLAGS = (SCALAR,KIDS)
                {
                    TYPE = null  ===> (4)
                      (was rv2sv)
                    FLAGS = (SCALAR,KIDS)
                    {
    3                   TYPE = gvsv  ===> 4
                        FLAGS = (SCALAR)
                        GV = main::b
                    }
                }
                {
                    TYPE = null  ===> (5)
                      (was rv2sv)
                    FLAGS = (SCALAR,KIDS)
                    {
    4                   TYPE = gvsv  ===> 5
                        FLAGS = (SCALAR)
                        GV = main::c
                    }
                }

This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
not optimized away (one per number in the left column).  The immediate
children of the given node correspond to C<{}> pairs on the same level
of indentation, thus this listing corresponds to the tree:

                   add
                 /     \
               null    null
                |       |
               gvsv    gvsv

The execution order is indicated by C<===E<gt>> marks, thus it is C<3
4 5 6> (node C<6> is not included into above listing), i.e.,
C<gvsv gvsv add whatever>.

Each of these nodes represents an op, a fundamental operation inside the
Perl core. The code which implements each operation can be found in the
F<pp*.c> files; the function which implements the op with type C<gvsv>
is C<pp_gvsv>, and so on. As the tree above shows, different ops have
different numbers of children: C<add> is a binary operator, as one would
expect, and so has two children. To accommodate the various different
numbers of children, there are various types of op data structure, and
they link together in different ways.

The simplest type of op structure is C<OP>: this has no children. Unary
operators, C<UNOP>s, have one child, and this is pointed to by the
C<op_first> field. Binary operators (C<BINOP>s) have not only an
C<op_first> field but also an C<op_last> field. The most complex type of
op is a C<LISTOP>, which has any number of children. In this case, the
first child is pointed to by C<op_first> and the last child by
C<op_last>. The children in between can be found by iteratively
following the C<op_sibling> pointer from the first child to the last.

There are also two other op types: a C<PMOP> holds a regular expression,
and has no children, and a C<LOOP> may or may not have children. If the
C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
complicate matters, if a C<UNOP> is actually a C<null> op after
optimization (see L</Compile pass 2: context propagation>) it will still
have children in accordance with its former type.

=head2 Compile pass 1: check routines

The tree is created by the compiler while I<yacc> code feeds it
the constructions it recognizes. Since I<yacc> works bottom-up, so does
the first pass of perl compilation.

What makes this pass interesting for perl developers is that some
optimization may be performed on this pass.  This is optimization by
so-called "check routines".  The correspondence between node names
and corresponding check routines is described in F<opcode.pl> (do not
forget to run C<make regen_headers> if you modify this file).

A check routine is called when the node is fully constructed except
for the execution-order thread.  Since at this time there are no
back-links to the currently constructed node, one can do most any
operation to the top-level node, including freeing it and/or creating
new nodes above/below it.

The check routine returns the node which should be inserted into the
tree (if the top-level node was not modified, check routine returns
its argument).

By convention, check routines have names C<ck_*>. They are usually
called from C<new*OP> subroutines (or C<convert>) (which in turn are
called from F<perly.y>).

=head2 Compile pass 1a: constant folding

Immediately after the check routine is called the returned node is
checked for being compile-time executable.  If it is (the value is
judged to be constant) it is immediately executed, and a I<constant>
node with the "return value" of the corresponding subtree is
substituted instead.  The subtree is deleted.

If constant folding was not performed, the execution-order thread is
created.

=head2 Compile pass 2: context propagation

When a context for a part of compile tree is known, it is propagated
down through the tree.  At this time the context can have 5 values
(instead of 2 for runtime context): void, boolean, scalar, list, and
lvalue.  In contrast with the pass 1 this pass is processed from top
to bottom: a node's context determines the context for its children.

Additional context-dependent optimizations are performed at this time.
Since at this moment the compile tree contains back-references (via
"thread" pointers), nodes cannot be free()d now.  To allow
optimized-away nodes at this stage, such nodes are null()ified instead
of free()ing (i.e. their type is changed to OP_NULL).

=head2 Compile pass 3: peephole optimization

After the compile tree for a subroutine (or for an C<eval> or a file)
is created, an additional pass over the code is performed. This pass
is neither top-down or bottom-up, but in the execution order (with
additional complications for conditionals).  These optimizations are
done in the subroutine peep().  Optimizations performed at this stage
are subject to the same restrictions as in the pass 2.

=head1 Examining internal data structures with the C<dump> functions

To aid debugging, the source file F<dump.c> contains a number of
functions which produce formatted output of internal data structures.

The most commonly used of these functions is C<Perl_sv_dump>; it's used
for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
C<sv_dump> to produce debugging output from Perl-space, so users of that
module should already be familiar with its format. 

C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
derivatives, and produces output similiar to C<perl -Dx>; in fact,
C<Perl_dump_eval> will dump the main root of the code being evaluated,
exactly like C<-Dx>.

Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
subroutines in a package like so: (Thankfully, these are all xsubs, so
there is no op tree)

    (gdb) print Perl_dump_packsubs(PL_defstash)

    SUB attributes::bootstrap = (xsub 0x811fedc 0)

    SUB UNIVERSAL::can = (xsub 0x811f50c 0)

    SUB UNIVERSAL::isa = (xsub 0x811f304 0)

    SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)

    SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)

and C<Perl_dump_all>, which dumps all the subroutines in the stash and
the op tree of the main root.

=head1 How multiple interpreters and concurrency are supported

=head2 Background and PERL_IMPLICIT_CONTEXT

The Perl interpreter can be regarded as a closed box: it has an API
for feeding it code or otherwise making it do things, but it also has
functions for its own use.  This smells a lot like an object, and
there are ways for you to build Perl so that you can have multiple
interpreters, with one interpreter represented either as a C++ object,
a C structure, or inside a thread.  The thread, the C structure, or
the C++ object will contain all the context, the state of that
interpreter.

Three macros control the major Perl build flavors: MULTIPLICITY,
USE_THREADS and PERL_OBJECT.  The MULTIPLICITY build has a C structure
that packages all the interpreter state, there is a similar thread-specific
data structure under USE_THREADS, and the (now deprecated) PERL_OBJECT
build has a C++ class to maintain interpreter state.  In all three cases,
PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
support for passing in a "hidden" first argument that represents all three
data structures.

All this obviously requires a way for the Perl internal functions to be
C++ methods, subroutines taking some kind of structure as the first
argument, or subroutines taking nothing as the first argument.  To
enable these three very different ways of building the interpreter,
the Perl source (as it does in so many other situations) makes heavy
use of macros and subroutine naming conventions.

First problem: deciding which functions will be public API functions and
which will be private.  All functions whose names begin C<S_> are private 
(think "S" for "secret" or "static").  All other functions begin with
"Perl_", but just because a function begins with "Perl_" does not mean it is
part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a 
function is part of the API is to find its entry in L<perlapi>.  
If it exists in L<perlapi>, it's part of the API.  If it doesn't, and you 
think it should be (i.e., you need it for your extension), send mail via 
L<perlbug> explaining why you think it should be.

Second problem: there must be a syntax so that the same subroutine
declarations and calls can pass a structure as their first argument,
or pass nothing.  To solve this, the subroutines are named and
declared in a particular way.  Here's a typical start of a static
function used within the Perl guts:

  STATIC void
  S_incline(pTHX_ char *s)

STATIC becomes "static" in C, and is #define'd to nothing in C++.

A public function (i.e. part of the internal API, but not necessarily
sanctioned for use in extensions) begins like this:

  void
  Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv)

C<pTHX_> is one of a number of macros (in perl.h) that hide the
details of the interpreter's context.  THX stands for "thread", "this",
or "thingy", as the case may be.  (And no, George Lucas is not involved. :-)
The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
their variants.

When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
first argument containing the interpreter's context.  The trailing underscore
in the pTHX_ macro indicates that the macro expansion needs a comma
after the context argument because other arguments follow it.  If
PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
subroutine is not prototyped to take the extra argument.  The form of the
macro without the trailing underscore is used when there are no additional
explicit arguments.

When a core function calls another, it must pass the context.  This
is normally hidden via macros.  Consider C<sv_setsv>.  It expands into
something like this:

    ifdef PERL_IMPLICIT_CONTEXT
      define sv_setsv(a,b)      Perl_sv_setsv(aTHX_ a, b)
      /* can't do this for vararg functions, see below */
    else
      define sv_setsv           Perl_sv_setsv
    endif

This works well, and means that XS authors can gleefully write:

    sv_setsv(foo, bar);

and still have it work under all the modes Perl could have been
compiled with.

Under PERL_OBJECT in the core, that will translate to either:

    CPerlObj::Perl_sv_setsv(foo,bar);  # in CPerlObj functions,
                                       # C++ takes care of 'this'
  or

    pPerl->Perl_sv_setsv(foo,bar);     # in truly static functions,
                                       # see objXSUB.h

Under PERL_OBJECT in extensions (aka PERL_CAPI), or under
MULTIPLICITY/USE_THREADS with PERL_IMPLICIT_CONTEXT in both core
and extensions, it will become:

    Perl_sv_setsv(aTHX_ foo, bar);     # the canonical Perl "API"
                                       # for all build flavors

This doesn't work so cleanly for varargs functions, though, as macros
imply that the number of arguments is known in advance.  Instead we
either need to spell them out fully, passing C<aTHX_> as the first
argument (the Perl core tends to do this with functions like
Perl_warner), or use a context-free version.

The context-free version of Perl_warner is called
Perl_warner_nocontext, and does not take the extra argument.  Instead
it does dTHX; to get the context from thread-local storage.  We
C<#define warner Perl_warner_nocontext> so that extensions get source
compatibility at the expense of performance.  (Passing an arg is
cheaper than grabbing it from thread-local storage.)

You can ignore [pad]THX[xo] when browsing the Perl headers/sources.
Those are strictly for use within the core.  Extensions and embedders
need only be aware of [pad]THX.

=head2 So what happened to dTHR?

C<dTHR> was introduced in perl 5.005 to support the older thread model.
The older thread model now uses the C<THX> mechanism to pass context
pointers around, so C<dTHR> is not useful any more.  Perl 5.6.0 and
later still have it for backward source compatibility, but it is defined
to be a no-op.

=head2 How do I use all this in extensions?

When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
any functions in the Perl API will need to pass the initial context
argument somehow.  The kicker is that you will need to write it in
such a way that the extension still compiles when Perl hasn't been
built with PERL_IMPLICIT_CONTEXT enabled.

There are three ways to do this.  First, the easy but inefficient way,
which is also the default, in order to maintain source compatibility
with extensions: whenever XSUB.h is #included, it redefines the aTHX
and aTHX_ macros to call a function that will return the context.
Thus, something like:

        sv_setsv(asv, bsv);

in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
in effect:

        Perl_sv_setsv(Perl_get_context(), asv, bsv);

or to this otherwise:

        Perl_sv_setsv(asv, bsv);

You have to do nothing new in your extension to get this; since
the Perl library provides Perl_get_context(), it will all just
work.

The second, more efficient way is to use the following template for
your Foo.xs:

        #define PERL_NO_GET_CONTEXT     /* we want efficiency */
        #include "EXTERN.h"
        #include "perl.h"
        #include "XSUB.h"

        static my_private_function(int arg1, int arg2);

        static SV *
        my_private_function(int arg1, int arg2)
        {
            dTHX;       /* fetch context */
            ... call many Perl API functions ...
        }

        [... etc ...]

        MODULE = Foo            PACKAGE = Foo

        /* typical XSUB */

        void
        my_xsub(arg)
                int arg
            CODE:
                my_private_function(arg, 10);

Note that the only two changes from the normal way of writing an
extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
including the Perl headers, followed by a C<dTHX;> declaration at
the start of every function that will call the Perl API.  (You'll
know which functions need this, because the C compiler will complain
that there's an undeclared identifier in those functions.)  No changes
are needed for the XSUBs themselves, because the XS() macro is
correctly defined to pass in the implicit context if needed.

The third, even more efficient way is to ape how it is done within
the Perl guts:


        #define PERL_NO_GET_CONTEXT     /* we want efficiency */
        #include "EXTERN.h"
        #include "perl.h"
        #include "XSUB.h"

        /* pTHX_ only needed for functions that call Perl API */
        static my_private_function(pTHX_ int arg1, int arg2);

        static SV *
        my_private_function(pTHX_ int arg1, int arg2)
        {
            /* dTHX; not needed here, because THX is an argument */
            ... call Perl API functions ...
        }

        [... etc ...]

        MODULE = Foo            PACKAGE = Foo

        /* typical XSUB */

        void
        my_xsub(arg)
                int arg
            CODE:
                my_private_function(aTHX_ arg, 10);

This implementation never has to fetch the context using a function
call, since it is always passed as an extra argument.  Depending on
your needs for simplicity or efficiency, you may mix the previous
two approaches freely.

Never add a comma after C<pTHX> yourself--always use the form of the
macro with the underscore for functions that take explicit arguments,
or the form without the argument for functions with no explicit arguments.

=head2 Should I do anything special if I call perl from multiple threads?

If you create interpreters in one thread and then proceed to call them in
another, you need to make sure perl's own Thread Local Storage (TLS) slot is
initialized correctly in each of those threads.

The C<perl_alloc> and C<perl_clone> API functions will automatically set
the TLS slot to the interpreter they created, so that there is no need to do
anything special if the interpreter is always accessed in the same thread that
created it, and that thread did not create or call any other interpreters
afterwards.  If that is not the case, you have to set the TLS slot of the
thread before calling any functions in the Perl API on that particular
interpreter.  This is done by calling the C<PERL_SET_CONTEXT> macro in that
thread as the first thing you do:

	/* do this before doing anything else with some_perl */
	PERL_SET_CONTEXT(some_perl);

	... other Perl API calls on some_perl go here ...

=head2 Future Plans and PERL_IMPLICIT_SYS

Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
that the interpreter knows about itself and pass it around, so too are
there plans to allow the interpreter to bundle up everything it knows
about the environment it's running on.  This is enabled with the
PERL_IMPLICIT_SYS macro.  Currently it only works with PERL_OBJECT
and USE_THREADS on Windows (see inside iperlsys.h).

This allows the ability to provide an extra pointer (called the "host"
environment) for all the system calls.  This makes it possible for
all the system stuff to maintain their own state, broken down into
seven C structures.  These are thin wrappers around the usual system
calls (see win32/perllib.c) for the default perl executable, but for a
more ambitious host (like the one that would do fork() emulation) all
the extra work needed to pretend that different interpreters are
actually different "processes", would be done here.

The Perl engine/interpreter and the host are orthogonal entities.
There could be one or more interpreters in a process, and one or
more "hosts", with free association between them.

=head1 Internal Functions

All of Perl's internal functions which will be exposed to the outside
world are be prefixed by C<Perl_> so that they will not conflict with XS
functions or functions used in a program in which Perl is embedded.
Similarly, all global variables begin with C<PL_>. (By convention,
static functions start with C<S_>)

Inside the Perl core, you can get at the functions either with or
without the C<Perl_> prefix, thanks to a bunch of defines that live in
F<embed.h>. This header file is generated automatically from
F<embed.pl>. F<embed.pl> also creates the prototyping header files for
the internal functions, generates the documentation and a lot of other
bits and pieces. It's important that when you add a new function to the
core or change an existing one, you change the data in the table at the
end of F<embed.pl> as well. Here's a sample entry from that table:

    Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval

The second column is the return type, the third column the name. Columns
after that are the arguments. The first column is a set of flags:

=over 3

=item A

This function is a part of the public API.

=item p

This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>

=item d

This function has documentation using the C<apidoc> feature which we'll
look at in a second.

=back

Other available flags are:

=over 3

=item s

This is a static function and is defined as C<S_whatever>, and usually
called within the sources as C<whatever(...)>.

=item n

This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)

=item r

This function never returns; C<croak>, C<exit> and friends.

=item f

This function takes a variable number of arguments, C<printf> style.
The argument list should end with C<...>, like this:

    Afprd   |void   |croak          |const char* pat|...

=item M

This function is part of the experimental development API, and may change 
or disappear without notice.

=item o

This function should not have a compatibility macro to define, say,
C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.

=item j

This function is not a member of C<CPerlObj>. If you don't know
what this means, don't use it.

=item x

This function isn't exported out of the Perl core.

=back

If you edit F<embed.pl>, you will need to run C<make regen_headers> to
force a rebuild of F<embed.h> and other auto-generated files.

=head2 Formatted Printing of IVs, UVs, and NVs

If you are printing IVs, UVs, or NVS instead of the stdio(3) style
formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
following macros for portability

        IVdf            IV in decimal
        UVuf            UV in decimal
        UVof            UV in octal
        UVxf            UV in hexadecimal
        NVef            NV %e-like
        NVff            NV %f-like
        NVgf            NV %g-like

These will take care of 64-bit integers and long doubles.
For example:

        printf("IV is %"IVdf"\n", iv);

The IVdf will expand to whatever is the correct format for the IVs.

If you are printing addresses of pointers, use UVxf combined
with PTR2UV(), do not use %lx or %p.

=head2 Pointer-To-Integer and Integer-To-Pointer

Because pointer size does not necessarily equal integer size,
use the follow macros to do it right.

        PTR2UV(pointer)
        PTR2IV(pointer)
        PTR2NV(pointer)
        INT2PTR(pointertotype, integer)

For example:

        IV  iv = ...;
        SV *sv = INT2PTR(SV*, iv);

and

        AV *av = ...;
        UV  uv = PTR2UV(av);

=head2 Source Documentation

There's an effort going on to document the internal functions and
automatically produce reference manuals from them - L<perlapi> is one
such manual which details all the functions which are available to XS
writers. L<perlintern> is the autogenerated manual for the functions
which are not part of the API and are supposedly for internal use only.

Source documentation is created by putting POD comments into the C
source, like this:

 /*
 =for apidoc sv_setiv

 Copies an integer into the given SV.  Does not handle 'set' magic.  See
 C<sv_setiv_mg>.

 =cut
 */

Please try and supply some documentation if you add functions to the
Perl core.

=head1 Unicode Support

Perl 5.6.0 introduced Unicode support. It's important for porters and XS
writers to understand this support and make sure that the code they
write does not corrupt Unicode data.

=head2 What B<is> Unicode, anyway?

In the olden, less enlightened times, we all used to use ASCII. Most of
us did, anyway. The big problem with ASCII is that it's American. Well,
no, that's not actually the problem; the problem is that it's not
particularly useful for people who don't use the Roman alphabet. What
used to happen was that particular languages would stick their own
alphabet in the upper range of the sequence, between 128 and 255. Of
course, we then ended up with plenty of variants that weren't quite
ASCII, and the whole point of it being a standard was lost.

Worse still, if you've got a language like Chinese or
Japanese that has hundreds or thousands of characters, then you really
can't fit them into a mere 256, so they had to forget about ASCII
altogether, and build their own systems using pairs of numbers to refer
to one character.

To fix this, some people formed Unicode, Inc. and
produced a new character set containing all the characters you can
possibly think of and more. There are several ways of representing these
characters, and the one Perl uses is called UTF8. UTF8 uses
a variable number of bytes to represent a character, instead of just
one. You can learn more about Unicode at http://www.unicode.org/

=head2 How can I recognise a UTF8 string?

You can't. This is because UTF8 data is stored in bytes just like
non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types)
capital E with a grave accent, is represented by the two bytes
C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
has that byte sequence as well. So you can't tell just by looking - this
is what makes Unicode input an interesting problem.

The API function C<is_utf8_string> can help; it'll tell you if a string
contains only valid UTF8 characters. However, it can't do the work for
you. On a character-by-character basis, C<is_utf8_char> will tell you
whether the current character in a string is valid UTF8.

=head2 How does UTF8 represent Unicode characters?

As mentioned above, UTF8 uses a variable number of bytes to store a
character. Characters with values 1...128 are stored in one byte, just
like good ol' ASCII. Character 129 is stored as C<v194.129>; this
continues up to character 191, which is C<v194.191>. Now we've run out of
bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
so it goes on, moving to three bytes at character 2048.

Assuming you know you're dealing with a UTF8 string, you can find out
how long the first character in it is with the C<UTF8SKIP> macro:

    char *utf = "\305\233\340\240\201";
    I32 len;

    len = UTF8SKIP(utf); /* len is 2 here */
    utf += len;
    len = UTF8SKIP(utf); /* len is 3 here */

Another way to skip over characters in a UTF8 string is to use
C<utf8_hop>, which takes a string and a number of characters to skip
over. You're on your own about bounds checking, though, so don't use it
lightly.

All bytes in a multi-byte UTF8 character will have the high bit set, so
you can test if you need to do something special with this character
like this:

    UV uv;

    if (utf & 0x80)
        /* Must treat this as UTF8 */
        uv = utf8_to_uv(utf);
    else
        /* OK to treat this character as a byte */
        uv = *utf;

You can also see in that example that we use C<utf8_to_uv> to get the
value of the character; the inverse function C<uv_to_utf8> is available
for putting a UV into UTF8:

    if (uv > 0x80)
        /* Must treat this as UTF8 */
        utf8 = uv_to_utf8(utf8, uv);
    else
        /* OK to treat this character as a byte */
        *utf8++ = uv;

You B<must> convert characters to UVs using the above functions if
you're ever in a situation where you have to match UTF8 and non-UTF8
characters. You may not skip over UTF8 characters in this case. If you
do this, you'll lose the ability to match hi-bit non-UTF8 characters;
for instance, if your UTF8 string contains C<v196.172>, and you skip
that character, you can never match a C<chr(200)> in a non-UTF8 string.
So don't do that!

=head2 How does Perl store UTF8 strings?

Currently, Perl deals with Unicode strings and non-Unicode strings
slightly differently. If a string has been identified as being UTF-8
encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
manipulate this flag with the following macros:

    SvUTF8(sv)
    SvUTF8_on(sv)
    SvUTF8_off(sv)

This flag has an important effect on Perl's treatment of the string: if
Unicode data is not properly distinguished, regular expressions,
C<length>, C<substr> and other string handling operations will have
undesirable results.

The problem comes when you have, for instance, a string that isn't
flagged is UTF8, and contains a byte sequence that could be UTF8 -
especially when combining non-UTF8 and UTF8 strings.

Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
need be sure you don't accidentally knock it off while you're
manipulating SVs. More specifically, you cannot expect to do this:

    SV *sv;
    SV *nsv;
    STRLEN len;
    char *p;

    p = SvPV(sv, len);
    frobnicate(p);
    nsv = newSVpvn(p, len);

The C<char*> string does not tell you the whole story, and you can't
copy or reconstruct an SV just by copying the string value. Check if the
old SV has the UTF8 flag set, and act accordingly:

    p = SvPV(sv, len);
    frobnicate(p);
    nsv = newSVpvn(p, len);
    if (SvUTF8(sv))
        SvUTF8_on(nsv);

In fact, your C<frobnicate> function should be made aware of whether or
not it's dealing with UTF8 data, so that it can handle the string
appropriately.

=head2 How do I convert a string to UTF8?

If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
to upgrade one of the strings to UTF8. If you've got an SV, the easiest
way to do this is:

    sv_utf8_upgrade(sv);

However, you must not do this, for example:

    if (!SvUTF8(left))
        sv_utf8_upgrade(left);

If you do this in a binary operator, you will actually change one of the
strings that came into the operator, and, while it shouldn't be noticeable
by the end user, it can cause problems.

Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its
string argument. This is useful for having the data available for
comparisons and so on, without harming the original SV. There's also
C<utf8_to_bytes> to go the other way, but naturally, this will fail if
the string contains any characters above 255 that can't be represented
in a single byte.

=head2 Is there anything else I need to know?

Not really. Just remember these things:

=over 3

=item *

There's no way to tell if a string is UTF8 or not. You can tell if an SV
is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
something should be UTF8. Treat the flag as part of the PV, even though
it's not - if you pass on the PV to somewhere, pass on the flag too.

=item *

If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
unless C<!(*s & 0x80)> in which case you can use C<*s>.

=item *

When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
C<uv < 0x80> in which case you can use C<*s = uv>.

=item *

Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get
a new string which is UTF8 encoded. There are tricks you can use to
delay deciding whether you need to use a UTF8 string until you get to a
high character - C<HALF_UPGRADE> is one of those.

=back

=head1 AUTHORS

Until May 1997, this document was maintained by Jeff Okamoto
<okamoto@corp.hp.com>.  It is now maintained as part of Perl itself
by the Perl 5 Porters <perl5-porters@perl.org>.

With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
Stephen McCamant, and Gurusamy Sarathy.

API Listing originally by Dean Roehrich <roehrich@cray.com>.

Modifications to autogenerate the API listing (L<perlapi>) by Benjamin
Stuhl.

=head1 SEE ALSO

perlapi(1), perlintern(1), perlxs(1), perlembed(1)
x X8! P|N  | !|x<`li  8cst| @A 0<`ae8cvt| @A  xK5T`?A                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 =head1 NAME

perlhack - How to hack at the Perl internals

=head1 DESCRIPTION

This document attempts to explain how Perl development takes place,
and ends with some suggestions for people wanting to become bona fide
porters.

The perl5-porters mailing list is where the Perl standard distribution
is maintained and developed.  The list can get anywhere from 10 to 150
messages a day, depending on the heatedness of the debate.  Most days
there are two or three patches, extensions, features, or bugs being
discussed at a time.

A searchable archive of the list is at:

    http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/

The list is also archived under the usenet group name
C<perl.porters-gw> at:

    http://www.deja.com/

List subscribers (the porters themselves) come in several flavours.
Some are quiet curious lurkers, who rarely pitch in and instead watch
the ongoing development to ensure they're forewarned of new changes or
features in Perl.  Some are representatives of vendors, who are there
to make sure that Perl continues to compile and work on their
platforms.  Some patch any reported bug that they know how to fix,
some are actively patching their pet area (threads, Win32, the regexp
engine), while others seem to do nothing but complain.  In other
words, it's your usual mix of technical people.

Over this group of porters presides Larry Wall.  He has the final word
in what does and does not change in the Perl language.  Various
releases of Perl are shepherded by a ``pumpking'', a porter
responsible for gathering patches, deciding on a patch-by-patch
feature-by-feature basis what will and will not go into the release.
For instance, Gurusamy Sarathy is the pumpking for the 5.6 release of
Perl.

In addition, various people are pumpkings for different things.  For
instance, Andy Dougherty and Jarkko Hietaniemi share the I<Configure>
pumpkin, and Tom Christiansen is the documentation pumpking.

Larry sees Perl development along the lines of the US government:
there's the Legislature (the porters), the Executive branch (the
pumpkings), and the Supreme Court (Larry).  The legislature can
discuss and submit patches to the executive branch all they like, but
the executive branch is free to veto them.  Rarely, the Supreme Court
will side with the executive branch over the legislature, or the
legislature over the executive branch.  Mostly, however, the
legislature and the executive branch are supposed to get along and
work out their differences without impeachment or court cases.

You might sometimes see reference to Rule 1 and Rule 2.  Larry's power
as Supreme Court is expressed in The Rules:

=over 4

=item 1

Larry is always by definition right about how Perl should behave.
This means he has final veto power on the core functionality.

=item 2

Larry is allowed to change his mind about any matter at a later date,
regardless of whether he previously invoked Rule 1.

=back

Got that?  Larry is always right, even when he was wrong.  It's rare
to see either Rule exercised, but they are often alluded to.

New features and extensions to the language are contentious, because
the criteria used by the pumpkings, Larry, and other porters to decide
which features should be implemented and incorporated are not codified
in a few small design goals as with some other languages.  Instead,
the heuristics are flexible and often difficult to fathom.  Here is
one person's list, roughly in decreasing order of importance, of
heuristics that new features have to be weighed against:

=over 4

=item Does concept match the general goals of Perl?

These haven't been written anywhere in stone, but one approximation
is:

 1. Keep it fast, simple, and useful.
 2. Keep features/concepts as orthogonal as possible.
 3. No arbitrary limits (platforms, data sizes, cultures).
 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
 5. Either assimilate new technologies, or build bridges to them.

=item Where is the implementation?

All the talk in the world is useless without an implementation.  In
almost every case, the person or people who argue for a new feature
will be expected to be the ones who implement it.  Porters capable
of coding new features have their own agendas, and are not available
to implement your (possibly good) idea.

=item Backwards compatibility

It's a cardinal sin to break existing Perl programs.  New warnings are
contentious--some say that a program that emits warnings is not
broken, while others say it is.  Adding keywords has the potential to
break programs, changing the meaning of existing token sequences or
functions might break programs.

=item Could it be a module instead?

Perl 5 has extension mechanisms, modules and XS, specifically to avoid
the need to keep changing the Perl interpreter.  You can write modules
that export functions, you can give those functions prototypes so they
can be called like built-in functions, you can even write XS code to
mess with the runtime data structures of the Perl interpreter if you
want to implement really complicated things.  If it can be done in a
module instead of in the core, it's highly unlikely to be added.

=item Is the feature generic enough?

Is this something that only the submitter wants added to the language,
or would it be broadly useful?  Sometimes, instead of adding a feature
with a tight focus, the porters might decide to wait until someone
implements the more generalized feature.  For instance, instead of
implementing a ``delayed evaluation'' feature, the porters are waiting
for a macro system that would permit delayed evaluation and much more.

=item Does it potentially introduce new bugs?

Radical rewrites of large chunks of the Perl interpreter have the
potential to introduce new bugs.  The smaller and more localized the
change, the better.

=item Does it preclude other desirable features?

A patch is likely to be rejected if it closes off future avenues of
development.  For instance, a patch that placed a true and final
interpretation on prototypes is likely to be rejected because there
are still options for the future of prototypes that haven't been
addressed.

=item Is the implementation robust?

Good patches (tight code, complete, correct) stand more chance of
going in.  Sloppy or incorrect patches might be placed on the back
burner until the pumpking has time to fix, or might be discarded
altogether without further notice.

=item Is the implementation generic enough to be portable?

The worst patches make use of a system-specific features.  It's highly
unlikely that nonportable additions to the Perl language will be
accepted.

=item Is there enough documentation?

Patches without documentation are probably ill-thought out or
incomplete.  Nothing can be added without documentation, so submitting
a patch for the appropriate manpages as well as the source code is
always a good idea.  If appropriate, patches should add to the test
suite as well.

=item Is there another way to do it?

Larry said ``Although the Perl Slogan is I<There's More Than One Way
to Do It>, I hesitate to make 10 ways to do something''.  This is a
tricky heuristic to navigate, though--one man's essential addition is
another man's pointless cruft.

=item Does it create too much work?

Work for the pumpking, work for Perl programmers, work for module
authors, ...  Perl is supposed to be easy.

=item Patches speak louder than words

Working code is always preferred to pie-in-the-sky ideas.  A patch to
add a feature stands a much higher chance of making it to the language
than does a random feature request, no matter how fervently argued the
request might be.  This ties into ``Will it be useful?'', as the fact
that someone took the time to make the patch demonstrates a strong
desire for the feature.

=back

If you're on the list, you might hear the word ``core'' bandied
around.  It refers to the standard distribution.  ``Hacking on the
core'' means you're changing the C source code to the Perl
interpreter.  ``A core module'' is one that ships with Perl.

=head2 Keeping in sync

The source code to the Perl interpreter, in its different versions, is
kept in a repository managed by a revision control system (which is
currently the Perforce program, see http://perforce.com/).  The
pumpkings and a few others have access to the repository to check in
changes.  Periodically the pumpking for the development version of Perl
will release a new version, so the rest of the porters can see what's
changed.  The current state of the main trunk of repository, and patches
that describe the individual changes that have happened since the last
public release are available at this location:

    ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/

If you are a member of the perl5-porters mailing list, it is a good
thing to keep in touch with the most recent changes. If not only to
verify if what you would have posted as a bug report isn't already
solved in the most recent available perl development branch, also
known as perl-current, bleading edge perl, bleedperl or bleadperl.

Needless to say, the source code in perl-current is usually in a perpetual
state of evolution.  You should expect it to be very buggy.  Do B<not> use
it for any purpose other than testing and development.

Keeping in sync with the most recent branch can be done in several ways,
but the most convenient and reliable way is using B<rsync>, available at
ftp://rsync.samba.org/pub/rsync/ .  (You can also get the most recent
branch by FTP.)

If you choose to keep in sync using rsync, there are two approaches
to doing so:

=over 4

=item rsync'ing the source tree

Presuming you are in the directory where your perl source resides
and you have rsync installed and available, you can `upgrade' to
the bleadperl using:

 # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ .

This takes care of updating every single item in the source tree to
the latest applied patch level, creating files that are new (to your
distribution) and setting date/time stamps of existing files to
reflect the bleadperl status.

You can than check what patch was the latest that was applied by
looking in the file B<.patch>, which will show the number of the
latest patch.

If you have more than one machine to keep in sync, and not all of
them have access to the WAN (so you are not able to rsync all the
source trees to the real source), there are some ways to get around
this problem.

=over 4

=item Using rsync over the LAN

Set up a local rsync server which makes the rsynced source tree
available to the LAN and sync the other machines against this
directory.

From http://rsync.samba.org/README.html:

   "Rsync uses rsh or ssh for communication. It does not need to be
    setuid and requires no special privileges for installation.  It
    does not require a inetd entry or a deamon.  You must, however,
    have a working rsh or ssh system.  Using ssh is recommended for
    its security features."

=item Using pushing over the NFS

Having the other systems mounted over the NFS, you can take an
active pushing approach by checking the just updated tree against
the other not-yet synced trees. An example would be

  #!/usr/bin/perl -w

  use strict;
  use File::Copy;

  my %MF = map {
      m/(\S+)/;
      $1 => [ (stat $1)[2, 7, 9] ];	# mode, size, mtime
      } `cat MANIFEST`;

  my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);

  foreach my $host (keys %remote) {
      unless (-d $remote{$host}) {
	  print STDERR "Cannot Xsync for host $host\n";
	  next;
	  }
      foreach my $file (keys %MF) {
	  my $rfile = "$remote{$host}/$file";
	  my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
	  defined $size or ($mode, $size, $mtime) = (0, 0, 0);
	  $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
	  printf "%4s %-34s %8d %9d  %8d %9d\n",
	      $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
	  unlink $rfile;
	  copy ($file, $rfile);
	  utime time, $MF{$file}[2], $rfile;
	  chmod $MF{$file}[0], $rfile;
	  }
      }

though this is not perfect. It could be improved with checking
file checksums before updating. Not all NFS systems support
reliable utime support (when used over the NFS).

=back

=item rsync'ing the patches

The source tree is maintained by the pumpking who applies patches to
the files in the tree. These patches are either created by the
pumpking himself using C<diff -c> after updating the file manually or
by applying patches sent in by posters on the perl5-porters list.
These patches are also saved and rsync'able, so you can apply them
yourself to the source files.

Presuming you are in a directory where your patches reside, you can
get them in sync with

 # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .

This makes sure the latest available patch is downloaded to your
patch directory.

It's then up to you to apply these patches, using something like

 # last=`ls -rt1 *.gz | tail -1`
 # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
 # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
 # cd ../perl-current
 # patch -p1 -N <../perl-current-diffs/blead.patch

or, since this is only a hint towards how it works, use CPAN-patchaperl
from Andreas Knig to have better control over the patching process.

=back

=head2 Why rsync the source tree

=over 4

=item It's easier

Since you don't have to apply the patches yourself, you are sure all
files in the source tree are in the right state.

=item It's more recent

According to Gurusamy Sarathy:

   "... The rsync mirror is automatic and syncs with the repository
    every five minutes.

   "Updating the patch  area  still  requires  manual  intervention
    (with all the goofiness that implies,  which you've noted)  and
    is typically on a daily cycle.   Making this process  automatic
    is on my tuit list, but don't ask me when."

=item It's more reliable

Well, since the patches are updated by hand, I don't have to say any
more ... (see Sarathy's remark).

=back

=head2 Why rsync the patches

=over 4

=item It's easier

If you have more than one machine that you want to keep in track with
bleadperl, it's easier to rsync the patches only once and then apply
them to all the source trees on the different machines.

In case you try to keep in pace on 5 different machines, for which
only one of them has access to the WAN, rsync'ing all the source
trees should than be done 5 times over the NFS. Having
rsync'ed the patches only once, I can apply them to all the source
trees automatically. Need you say more ;-)

=item It's a good reference

If you do not only like to have the most recent development branch,
but also like to B<fix> bugs, or extend features, you want to dive
into the sources. If you are a seasoned perl core diver, you don't
need no manuals, tips, roadmaps, perlguts.pod or other aids to find
your way around. But if you are a starter, the patches may help you
in finding where you should start and how to change the bits that
bug you.

The file B<Changes> is updated on occasions the pumpking sees as his
own little sync points. On those occasions, he releases a tar-ball of
the current source tree (i.e. perl@7582.tar.gz), which will be an
excellent point to start with when choosing to use the 'rsync the
patches' scheme. Starting with perl@7582, which means a set of source
files on which the latest applied patch is number 7582, you apply all
succeeding patches available from then on (7583, 7584, ...).

You can use the patches later as a kind of search archive.

=over 4

=item Finding a start point

If you want to fix/change the behaviour of function/feature Foo, just
scan the patches for patches that mention Foo either in the subject,
the comments, or the body of the fix. A good chance the patch shows
you the files that are affected by that patch which are very likely
to be the starting point of your journey into the guts of perl.

=item Finding how to fix a bug

If you've found I<where> the function/feature Foo misbehaves, but you
don't know how to fix it (but you do know the change you want to
make), you can, again, peruse the patches for similar changes and
look how others apply the fix.

=item Finding the source of misbehaviour

When you keep in sync with bleadperl, the pumpking would love to
I<see> that the community efforts realy work. So after each of his
sync points, you are to 'make test' to check if everything is still
in working order. If it is, you do 'make ok', which will send an OK
report to perlbug@perl.org. (If you do not have access to a mailer
from the system you just finished successfully 'make test', you can
do 'make okfile', which creates the file C<perl.ok>, which you can
than take to your favourite mailer and mail yourself).

But of course, as always, things will not allways lead to a success
path, and one or more test do not pass the 'make test'. Before
sending in a bug report (using 'make nok' or 'make nokfile'), check
the mailing list if someone else has reported the bug already and if
so, confirm it by replying to that message. If not, you might want to
trace the source of that misbehaviour B<before> sending in the bug,
which will help all the other porters in finding the solution.

Here the saved patches come in very handy. You can check the list of
patches to see which patch changed what file and what change caused
the misbehaviour. If you note that in the bug report, it saves the
one trying to solve it, looking for that point.

=back

If searching the patches is too bothersome, you might consider using
perl's bugtron to find more information about discussions and
ramblings on posted bugs.

=back

If you want to get the best of both worlds, rsync both the source
tree for convenience, reliability and ease and rsync the patches
for reference.

=head2 Submitting patches

Always submit patches to I<perl5-porters@perl.org>.  This lets other
porters review your patch, which catches a surprising number of errors
in patches.  Either use the diff program (available in source code
form from I<ftp://ftp.gnu.org/pub/gnu/>), or use Johan Vromans'
I<makepatch> (available from I<CPAN/authors/id/JV/>).  Unified diffs
are preferred, but context diffs are accepted.  Do not send RCS-style
diffs or diffs without context lines.  More information is given in
the I<Porting/patching.pod> file in the Perl source distribution.
Please patch against the latest B<development> version (e.g., if
you're fixing a bug in the 5.005 track, patch against the latest
5.005_5x version).  Only patches that survive the heat of the
development branch get applied to maintenance versions.

Your patch should update the documentation and test suite.

To report a bug in Perl, use the program I<perlbug> which comes with
Perl (if you can't get Perl to work, send mail to the address
I<perlbug@perl.org> or I<perlbug@perl.com>).  Reporting bugs through
I<perlbug> feeds into the automated bug-tracking system, access to
which is provided through the web at I<http://bugs.perl.org/>.  It
often pays to check the archives of the perl5-porters mailing list to
see whether the bug you're reporting has been reported before, and if
so whether it was considered a bug.  See above for the location of
the searchable archives.

The CPAN testers (I<http://testers.cpan.org/>) are a group of
volunteers who test CPAN modules on a variety of platforms.  Perl Labs
(I<http://labs.perl.org/>) automatically tests Perl source releases on
platforms and gives feedback to the CPAN testers mailing list.  Both
efforts welcome volunteers.

It's a good idea to read and lurk for a while before chipping in.
That way you'll get to see the dynamic of the conversations, learn the
personalities of the players, and hopefully be better prepared to make
a useful contribution when do you speak up.

If after all this you still think you want to join the perl5-porters
mailing list, send mail to I<perl5-porters-subscribe@perl.org>.  To
unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>.

To hack on the Perl guts, you'll need to read the following things:

=over 3

=item L<perlguts>

This is of paramount importance, since it's the documentation of what
goes where in the Perl source. Read it over a couple of times and it
might start to make sense - don't worry if it doesn't yet, because the
best way to study it is to read it in conjunction with poking at Perl
source, and we'll do that later on.

You might also want to look at Gisle Aas's illustrated perlguts -
there's no guarantee that this will be absolutely up-to-date with the
latest documentation in the Perl core, but the fundamentals will be
right. (http://gisle.aas.no/perl/illguts/)

=item L<perlxstut> and L<perlxs>

A working knowledge of XSUB programming is incredibly useful for core
hacking; XSUBs use techniques drawn from the PP code, the portion of the
guts that actually executes a Perl program. It's a lot gentler to learn
those techniques from simple examples and explanation than from the core
itself.

=item L<perlapi>

The documentation for the Perl API explains what some of the internal
functions do, as well as the many macros used in the source.

=item F<Porting/pumpkin.pod>

This is a collection of words of wisdom for a Perl porter; some of it is
only useful to the pumpkin holder, but most of it applies to anyone
wanting to go about Perl development.

=item The perl5-porters FAQ

This is posted to perl5-porters at the beginning on every month, and
should be available from http://perlhacker.org/p5p-faq; alternatively,
you can get the FAQ emailed to you by sending mail to
C<perl5-porters-faq@perl.org>. It contains hints on reading
perl5-porters, information on how perl5-porters works and how Perl
development in general works.

=back

=head2 Finding Your Way Around

Perl maintenance can be split into a number of areas, and certain people
(pumpkins) will have responsibility for each area. These areas sometimes
correspond to files or directories in the source kit. Among the areas are:

=over 3

=item Core modules

Modules shipped as part of the Perl core live in the F<lib/> and F<ext/>
subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
contains the core XS modules.

=item Documentation

Documentation maintenance includes looking after everything in the
F<pod/> directory, (as well as contributing new documentation) and
the documentation to the modules in core.

=item Configure

The configure process is the way we make Perl portable across the
myriad of operating systems it supports. Responsibility for the
configure, build and installation process, as well as the overall
portability of the core code rests with the configure pumpkin - others
help out with individual operating systems.

The files involved are the operating system directories, (F<win32/>,
F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
and F<Makefile>, as well as the metaconfig files which generate
F<Configure>. (metaconfig isn't included in the core distribution.)

=item Interpreter

And of course, there's the core of the Perl interpreter itself. Let's
have a look at that in a little more detail.

=back

Before we leave looking at the layout, though, don't forget that
F<MANIFEST> contains not only the file names in the Perl distribution,
but short descriptions of what's in them, too. For an overview of the
important files, try this:

    perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST

=head2 Elements of the interpreter

The work of the interpreter has two main stages: compiling the code
into the internal representation, or bytecode, and then executing it.
L<perlguts/Compiled code> explains exactly how the compilation stage
happens.

Here is a short breakdown of perl's operation:

=over 3

=item Startup

The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
This is very high-level code, enough to fit on a single screen, and it
resembles the code found in L<perlembed>; most of the real action takes
place in F<perl.c>

First, F<perlmain.c> allocates some memory and constructs a Perl
interpreter:

    1 PERL_SYS_INIT3(&argc,&argv,&env);
    2
    3 if (!PL_do_undump) {
    4     my_perl = perl_alloc();
    5     if (!my_perl)
    6         exit(1);
    7     perl_construct(my_perl);
    8     PL_perl_destruct_level = 0;
    9 }

Line 1 is a macro, and its definition is dependent on your operating
system. Line 3 references C<PL_do_undump>, a global variable - all
global variables in Perl start with C<PL_>. This tells you whether the
current running program was created with the C<-u> flag to perl and then
F<undump>, which means it's going to be false in any sane context.

Line 4 calls a function in F<perl.c> to allocate memory for a Perl
interpreter. It's quite a simple function, and the guts of it looks like
this:

    my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));

Here you see an example of Perl's system abstraction, which we'll see
later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
own C<malloc> as defined in F<malloc.c> if you selected that option at
configure time.

Next, in line 7, we construct the interpreter; this sets up all the
special variables that Perl needs, the stacks, and so on.

Now we pass Perl the command line options, and tell it to go:

    exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
    if (!exitstatus) {
        exitstatus = perl_run(my_perl);
    }


C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
in F<perl.c>, which processes the command line options, sets up any
statically linked XS modules, opens the program and calls C<yyparse> to
parse it.

=item Parsing

The aim of this stage is to take the Perl source, and turn it into an op
tree. We'll see what one of those looks like later. Strictly speaking,
there's three things going on here.

C<yyparse>, the parser, lives in F<perly.c>, although you're better off
reading the original YACC input in F<perly.y>. (Yes, Virginia, there
B<is> a YACC grammar for Perl!) The job of the parser is to take your
code and `understand' it, splitting it into sentences, deciding which
operands go with which operators and so on.

The parser is nobly assisted by the lexer, which chunks up your input
into tokens, and decides what type of thing each token is: a variable
name, an operator, a bareword, a subroutine, a core function, and so on.
The main point of entry to the lexer is C<yylex>, and that and its
associated routines can be found in F<toke.c>. Perl isn't much like
other computer languages; it's highly context sensitive at times, it can
be tricky to work out what sort of token something is, or where a token
ends. As such, there's a lot of interplay between the tokeniser and the
parser, which can get pretty frightening if you're not used to it.

As the parser understands a Perl program, it builds up a tree of
operations for the interpreter to perform during execution. The routines
which construct and link together the various operations are to be found
in F<op.c>, and will be examined later.

=item Optimization

Now the parsing stage is complete, and the finished tree represents
the operations that the Perl interpreter needs to perform to execute our
program. Next, Perl does a dry run over the tree looking for
optimisations: constant expressions such as C<3 + 4> will be computed
now, and the optimizer will also see if any multiple operations can be
replaced with a single one. For instance, to fetch the variable C<$foo>,
instead of grabbing the glob C<*foo> and looking at the scalar
component, the optimizer fiddles the op tree to use a function which
directly looks up the scalar in question. The main optimizer is C<peep>
in F<op.c>, and many ops have their own optimizing functions.

=item Running

Now we're finally ready to go: we have compiled Perl byte code, and all
that's left to do is run it. The actual execution is done by the
C<runops_standard> function in F<run.c>; more specifically, it's done by
these three innocent looking lines:

    while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
        PERL_ASYNC_CHECK();
    }

You may be more comfortable with the Perl version of that:

    PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};

Well, maybe not. Anyway, each op contains a function pointer, which
stipulates the function which will actually carry out the operation.
This function will return the next op in the sequence - this allows for
things like C<if> which choose the next op dynamically at run time.
The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
execution if required.

The actual functions called are known as PP code, and they're spread
between four files: F<pp_hot.c> contains the `hot' code, which is most
often used and highly optimized, F<pp_sys.c> contains all the
system-specific functions, F<pp_ctl.c> contains the functions which
implement control structures (C<if>, C<while> and the like) and F<pp.c>
contains everything else. These are, if you like, the C code for Perl's
built-in functions and operators.

=back

=head2 Internal Variable Types

You should by now have had a look at L<perlguts>, which tells you about
Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
that now.

These variables are used not only to represent Perl-space variables, but
also any constants in the code, as well as some structures completely
internal to Perl. The symbol table, for instance, is an ordinary Perl
hash. Your code is represented by an SV as it's read into the parser;
any program files you call are opened via ordinary Perl filehandles, and
so on.

The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
Perl program. Let's see, for instance, how Perl treats the constant
C<"hello">.

      % perl -MDevel::Peek -e 'Dump("hello")'
    1 SV = PV(0xa041450) at 0xa04ecbc
    2   REFCNT = 1
    3   FLAGS = (POK,READONLY,pPOK)
    4   PV = 0xa0484e0 "hello"\0
    5   CUR = 5
    6   LEN = 6

Reading C<Devel::Peek> output takes a bit of practise, so let's go
through it line by line.

Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
memory. SVs themselves are very simple structures, but they contain a
pointer to a more complex structure. In this case, it's a PV, a
structure which holds a string value, at location C<0xa041450>.  Line 2
is the reference count; there are no other references to this data, so
it's 1.

Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
read-only SV (because it's a constant) and the data is a PV internally.
Next we've got the contents of the string, starting at location
C<0xa0484e0>.

Line 5 gives us the current length of the string - note that this does
B<not> include the null terminator. Line 6 is not the length of the
string, but the length of the currently allocated buffer; as the string
grows, Perl automatically extends the available storage via a routine
called C<SvGROW>.

You can get at any of these quantities from C very easily; just add
C<Sv> to the name of the field shown in the snippet, and you've got a
macro which will return the value: C<SvCUR(sv)> returns the current
length of the string, C<SvREFCOUNT(sv)> returns the reference count,
C<SvPV(sv, len)> returns the string itself with its length, and so on.
More macros to manipulate these properties can be found in L<perlguts>.

Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c>

     1  void
     2  Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
     3  {
     4      STRLEN tlen;
     5      char *junk;

     6      junk = SvPV_force(sv, tlen);
     7      SvGROW(sv, tlen + len + 1);
     8      if (ptr == junk)
     9          ptr = SvPVX(sv);
    10      Move(ptr,SvPVX(sv)+tlen,len,char);
    11      SvCUR(sv) += len;
    12      *SvEND(sv) = '\0';
    13      (void)SvPOK_only_UTF8(sv);          /* validate pointer */
    14      SvTAINT(sv);
    15  }

This is a function which adds a string, C<ptr>, of length C<len> onto
the end of the PV stored in C<sv>. The first thing we do in line 6 is
make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
macro to force a PV. As a side effect, C<tlen> gets set to the current
value of the PV, and the PV itself is returned to C<junk>.

In line 7, we make sure that the SV will have enough room to accommodate
the old string, the new string and the null terminator. If C<LEN> isn't
big enough, C<SvGROW> will reallocate space for us.

Now, if C<junk> is the same as the string we're trying to add, we can
grab the string directly from the SV; C<SvPVX> is the address of the PV
in the SV.

Line 10 does the actual catenation: the C<Move> macro moves a chunk of
memory around: we move the string C<ptr> to the end of the PV - that's
the start of the PV plus its current length. We're moving C<len> bytes
of type C<char>. After doing so, we need to tell Perl we've extended the
string, by altering C<CUR> to reflect the new length. C<SvEND> is a
macro which gives us the end of the string, so that needs to be a
C<"\0">.

Line 13 manipulates the flags; since we've changed the PV, any IV or NV
values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF8-aware
version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
and turns on POK. The final C<SvTAINT> is a macro which launders tainted
data if taint mode is turned on.

AVs and HVs are more complicated, but SVs are by far the most common
variable type being thrown around. Having seen something of how we
manipulate these, let's go on and look at how the op tree is
constructed.

=head2 Op Trees

First, what is the op tree, anyway? The op tree is the parsed
representation of your program, as we saw in our section on parsing, and
it's the sequence of operations that Perl goes through to execute your
program, as we saw in L</Running>.

An op is a fundamental operation that Perl can perform: all the built-in
functions and operators are ops, and there are a series of ops which
deal with concepts the interpreter needs internally - entering and
leaving a block, ending a statement, fetching a variable, and so on.

The op tree is connected in two ways: you can imagine that there are two
"routes" through it, two orders in which you can traverse the tree.
First, parse order reflects how the parser understood the code, and
secondly, execution order tells perl what order to perform the
operations in.

The easiest way to examine the op tree is to stop Perl after it has
finished parsing, and get it to dump out the tree. This is exactly what
the compiler backends L<B::Terse|B::Terse> and L<B::Debug|B::Debug> do.

Let's have a look at how Perl sees C<$a = $b + $c>:

     % perl -MO=Terse -e '$a=$b+$c'
     1  LISTOP (0x8179888) leave
     2      OP (0x81798b0) enter
     3      COP (0x8179850) nextstate
     4      BINOP (0x8179828) sassign
     5          BINOP (0x8179800) add [1]
     6              UNOP (0x81796e0) null [15]
     7                  SVOP (0x80fafe0) gvsv  GV (0x80fa4cc) *b
     8              UNOP (0x81797e0) null [15]
     9                  SVOP (0x8179700) gvsv  GV (0x80efeb0) *c
    10          UNOP (0x816b4f0) null [15]
    11              SVOP (0x816dcf0) gvsv  GV (0x80fa460) *a

Let's start in the middle, at line 4. This is a BINOP, a binary
operator, which is at location C<0x8179828>. The specific operator in
question is C<sassign> - scalar assignment - and you can find the code
which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
binary operator, it has two children: the add operator, providing the
result of C<$b+$c>, is uppermost on line 5, and the left hand side is on
line 10.

Line 10 is the null op: this does exactly nothing. What is that doing
there? If you see the null op, it's a sign that something has been
optimized away after parsing. As we mentioned in L</Optimization>,
the optimization stage sometimes converts two operations into one, for
example when fetching a scalar variable. When this happens, instead of
rewriting the op tree and cleaning up the dangling pointers, it's easier
just to replace the redundant operation with the null op. Originally,
the tree would have looked like this:

    10          SVOP (0x816b4f0) rv2sv [15]
    11              SVOP (0x816dcf0) gv  GV (0x80fa460) *a

That is, fetch the C<a> entry from the main symbol table, and then look
at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)
happens to do both these things.

The right hand side, starting at line 5 is similar to what we've just
seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together
two C<gvsv>s.

Now, what's this about?

     1  LISTOP (0x8179888) leave
     2      OP (0x81798b0) enter
     3      COP (0x8179850) nextstate

C<enter> and C<leave> are scoping ops, and their job is to perform any
housekeeping every time you enter and leave a block: lexical variables
are tidied up, unreferenced variables are destroyed, and so on. Every
program will have those first three lines: C<leave> is a list, and its
children are all the statements in the block. Statements are delimited
by C<nextstate>, so a block is a collection of C<nextstate> ops, with
the ops to be performed for each statement being the children of
C<nextstate>. C<enter> is a single op which functions as a marker.

That's how Perl parsed the program, from top to bottom:

                        Program
                           |
                       Statement
                           |
                           =
                          / \
                         /   \
                        $a   +
                            / \
                          $b   $c

However, it's impossible to B<perform> the operations in this order:
you have to find the values of C<$b> and C<$c> before you add them
together, for instance. So, the other thread that runs through the op
tree is the execution order: each op has a field C<op_next> which points
to the next op to be run, so following these pointers tells us how perl
executes the code. We can traverse the tree in this order using
the C<exec> option to C<B::Terse>:

     % perl -MO=Terse,exec -e '$a=$b+$c'
     1  OP (0x8179928) enter
     2  COP (0x81798c8) nextstate
     3  SVOP (0x81796c8) gvsv  GV (0x80fa4d4) *b
     4  SVOP (0x8179798) gvsv  GV (0x80efeb0) *c
     5  BINOP (0x8179878) add [1]
     6  SVOP (0x816dd38) gvsv  GV (0x80fa468) *a
     7  BINOP (0x81798a0) sassign
     8  LISTOP (0x8179900) leave

This probably makes more sense for a human: enter a block, start a
statement. Get the values of C<$b> and C<$c>, and add them together.
Find C<$a>, and assign one to the other. Then leave.

The way Perl builds up these op trees in the parsing process can be
unravelled by examining F<perly.y>, the YACC grammar. Let's take the
piece we need to construct the tree for C<$a = $b + $c>

    1 term    :   term ASSIGNOP term
    2                { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
    3         |   term ADDOP term
    4                { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }

If you're not used to reading BNF grammars, this is how it works: You're
fed certain things by the tokeniser, which generally end up in upper
case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your
code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are
`terminal symbols', because you can't get any simpler than them.

The grammar, lines one and three of the snippet above, tells you how to
build up more complex forms. These complex forms, `non-terminal symbols'
are generally placed in lower case. C<term> here is a non-terminal
symbol, representing a single expression.

The grammar gives you the following rule: you can make the thing on the
left of the colon if you see all the things on the right in sequence.
This is called a "reduction", and the aim of parsing is to completely
reduce the input. There are several different ways you can perform a
reduction, separated by vertical bars: so, C<term> followed by C<=>
followed by C<term> makes a C<term>, and C<term> followed by C<+>
followed by C<term> can also make a C<term>.

So, if you see two terms with an C<=> or C<+>, between them, you can
turn them into a single expression. When you do this, you execute the
code in the block on the next line: if you see C<=>, you'll do the code
in line 2. If you see C<+>, you'll do the code in line 4. It's this code
which contributes to the op tree.

            |   term ADDOP term
            { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }

What this does is creates a new binary op, and feeds it a number of
variables. The variables refer to the tokens: C<$1> is the first token in
the input, C<$2> the second, and so on - think regular expression
backreferences. C<$$> is the op returned from this reduction. So, we
call C<newBINOP> to create a new binary operator. The first parameter to
C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
operator, so we want the type to be C<ADDOP>. We could specify this
directly, but it's right there as the second token in the input, so we
use C<$2>. The second parameter is the op's flags: 0 means `nothing
special'. Then the things to add: the left and right hand side of our
expression, in scalar context.

=head2 Stacks

When perl executes something like C<addop>, how does it pass on its
results to the next op? The answer is, through the use of stacks. Perl
has a number of stacks to store things it's currently working on, and
we'll look at the three most important ones here.

=over 3

=item Argument stack

Arguments are passed to PP code and returned from PP code using the
argument stack, C<ST>. The typical way to handle arguments is to pop
them off the stack, deal with them how you wish, and then push the result
back onto the stack. This is how, for instance, the cosine operator
works:

      NV value;
      value = POPn;
      value = Perl_cos(value);
      XPUSHn(value);

We'll see a more tricky example of this when we consider Perl's macros
below. C<POPn> gives you the NV (floating point value) of the top SV on
the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push
the result back as an NV. The C<X> in C<XPUSHn> means that the stack
should be extended if necessary - it can't be necessary here, because we
know there's room for one more item on the stack, since we've just
removed one! The C<XPUSH*> macros at least guarantee safety.

Alternatively, you can fiddle with the stack directly: C<SP> gives you
the first element in your portion of the stack, and C<TOP*> gives you
the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
negation of an integer:

     SETi(-TOPi);

Just set the integer value of the top stack entry to its negation.

Argument stack manipulation in the core is exactly the same as it is in
XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
description of the macros used in stack manipulation.

=item Mark stack

I say `your portion of the stack' above because PP code doesn't
necessarily get the whole stack to itself: if your function calls
another function, you'll only want to expose the arguments aimed for the
called function, and not (necessarily) let it get at your own data. The
way we do this is to have a `virtual' bottom-of-stack, exposed to each
function. The mark stack keeps bookmarks to locations in the argument
stack usable by each function. For instance, when dealing with a tied
variable, (internally, something with `P' magic) Perl has to call
methods for accesses to the tied variables. However, we need to separate
the arguments exposed to the method to the argument exposed to the
original function - the store or fetch or whatever it may be. Here's how
the tied C<push> is implemented; see C<av_push> in F<av.c>:

     1	PUSHMARK(SP);
     2	EXTEND(SP,2);
     3	PUSHs(SvTIED_obj((SV*)av, mg));
     4	PUSHs(val);
     5	PUTBACK;
     6	ENTER;
     7	call_method("PUSH", G_SCALAR|G_DISCARD);
     8	LEAVE;
     9	POPSTACK;

The lines which concern the mark stack are the first, fifth and last
lines: they save away, restore and remove the current position of the
argument stack. 

Let's examine the whole implementation, for practice:

     1	PUSHMARK(SP);

Push the current state of the stack pointer onto the mark stack. This is
so that when we've finished adding items to the argument stack, Perl
knows how many things we've added recently.

     2	EXTEND(SP,2);
     3	PUSHs(SvTIED_obj((SV*)av, mg));
     4	PUSHs(val);

We're going to add two more items onto the argument stack: when you have
a tied array, the C<PUSH> subroutine receives the object and the value
to be pushed, and that's exactly what we have here - the tied object,
retrieved with C<SvTIED_obj>, and the value, the SV C<val>.

     5	PUTBACK;

Next we tell Perl to make the change to the global stack pointer: C<dSP>
only gave us a local copy, not a reference to the global.

     6	ENTER;
     7	call_method("PUSH", G_SCALAR|G_DISCARD);
     8	LEAVE;

C<ENTER> and C<LEAVE> localise a block of code - they make sure that all
variables are tidied up, everything that has been localised gets
its previous value returned, and so on. Think of them as the C<{> and
C<}> of a Perl block.

To actually do the magic method call, we have to call a subroutine in
Perl space: C<call_method> takes care of that, and it's described in
L<perlcall>. We call the C<PUSH> method in scalar context, and we're
going to discard its return value.

     9	POPSTACK;

Finally, we remove the value we placed on the mark stack, since we
don't need it any more.

=item Save stack

C doesn't have a concept of local scope, so perl provides one. We've
seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
stack implements the C equivalent of, for example:

    {
        local $foo = 42;
        ...
    }

See L<perlguts/Localising Changes> for how to use the save stack.

=back

=head2 Millions of Macros

One thing you'll notice about the Perl source is that it's full of
macros. Some have called the pervasive use of macros the hardest thing
to understand, others find it adds to clarity. Let's take an example,
the code which implements the addition operator:

   1  PP(pp_add)
   2  {
   3      dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
   4      {
   5        dPOPTOPnnrl_ul;
   6        SETn( left + right );
   7        RETURN;
   8      }
   9  }

Every line here (apart from the braces, of course) contains a macro. The
first line sets up the function declaration as Perl expects for PP code;
line 3 sets up variable declarations for the argument stack and the
target, the return value of the operation. Finally, it tries to see if
the addition operation is overloaded; if so, the appropriate subroutine
is called.

Line 5 is another variable declaration - all variable declarations start
with C<d> - which pops from the top of the argument stack two NVs (hence
C<nn>) and puts them into the variables C<right> and C<left>, hence the
C<rl>. These are the two operands to the addition operator. Next, we
call C<SETn> to set the NV of the return value to the result of adding
the two values. This done, we return - the C<RETURN> macro makes sure
that our return value is properly handled, and we pass the next operator
to run back to the main run loop.

Most of these macros are explained in L<perlapi>, and some of the more
important ones are explained in L<perlxs> as well. Pay special attention
to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on
the C<[pad]THX_?> macros.


=head2 Poking at Perl

To really poke around with Perl, you'll probably want to build Perl for
debugging, like this:

    ./Configure -d -D optimize=-g
    make

C<-g> is a flag to the C compiler to have it produce debugging
information which will allow us to step through a running program.
F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
enables all the internal debugging code in Perl. There are a whole bunch
of things you can debug with this: L<perlrun> lists them all, and the
best way to find out about them is to play about with them. The most
useful options are probably

    l  Context (loop) stack processing
    t  Trace execution
    o  Method and overloading resolution
    c  String/numeric conversions

Some of the functionality of the debugging code can be achieved using XS
modules.

    -Dr => use re 'debug'
    -Dx => use O 'Debug'

=head2 Using a source-level debugger

If the debugging output of C<-D> doesn't help you, it's time to step
through perl's execution with a source-level debugger.

=over 3

=item *

We'll use C<gdb> for our examples here; the principles will apply to any
debugger, but check the manual of the one you're using.

=back

To fire up the debugger, type

    gdb ./perl

You'll want to do that in your Perl source tree so the debugger can read
the source code. You should see the copyright message, followed by the
prompt.

    (gdb)

C<help> will get you into the documentation, but here are the most
useful commands:

=over 3

=item run [args]

Run the program with the given arguments.

=item break function_name

=item break source.c:xxx

Tells the debugger that we'll want to pause execution when we reach
either the named function (but see L<perlguts/Internal Functions>!) or the given
line in the named source file.

=item step

Steps through the program a line at a time.

=item next

Steps through the program a line at a time, without descending into
functions.

=item continue

Run until the next breakpoint.

=item finish

Run until the end of the current function, then stop again.

=item 'enter'

Just pressing Enter will do the most recent operation again - it's a
blessing when stepping through miles of source code.

=item print

Execute the given C code and print its results. B<WARNING>: Perl makes
heavy use of macros, and F<gdb> is not aware of macros. You'll have to
substitute them yourself. So, for instance, you can't say

    print SvPV_nolen(sv)

but you have to say

    print Perl_sv_2pv_nolen(sv)

You may find it helpful to have a "macro dictionary", which you can
produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
recursively apply the macros for you. 

=back

=head2 Dumping Perl Data Structures

One way to get around this macro hell is to use the dumping functions in
F<dump.c>; these work a little like an internal
L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures
that you can't get at from Perl. Let's take an example. We'll use the
C<$a = $b + $c> we used before, but give it a bit of context: 
C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around?

What about C<pp_add>, the function we examined earlier to implement the
C<+> operator:

    (gdb) break Perl_pp_add
    Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.

Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>.
With the breakpoint in place, we can run our program:

    (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'

Lots of junk will go past as gdb reads in the relevant source files and
libraries, and then:

    Breakpoint 1, Perl_pp_add () at pp_hot.c:309
    309         dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
    (gdb) step
    311           dPOPTOPnnrl_ul;
    (gdb)

We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul>
arranges for two C<NV>s to be placed into C<left> and C<right> - let's
slightly expand it:

    #define dPOPTOPnnrl_ul  NV right = POPn; \
                            SV *leftsv = TOPs; \
                            NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0

C<POPn> takes the SV from the top of the stack and obtains its NV either
directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function.
C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses
C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from
C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>. 

Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
convert it. If we step again, we'll find ourselves there:

    Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
    1669        if (!sv)
    (gdb)

We can now use C<Perl_sv_dump> to investigate the SV:

    SV = PV(0xa057cc0) at 0xa0675d0
    REFCNT = 1
    FLAGS = (POK,pPOK)
    PV = 0xa06a510 "6XXXX"\0
    CUR = 5
    LEN = 6
    $1 = void

We know we're going to get C<6> from this, so let's finish the
subroutine:

    (gdb) finish
    Run till exit from #0  Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
    0x462669 in Perl_pp_add () at pp_hot.c:311
    311           dPOPTOPnnrl_ul;

We can also dump out this op: the current op is always stored in
C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
similar output to L<B::Debug|B::Debug>.

    {
    13  TYPE = add  ===> 14
        TARG = 1
        FLAGS = (SCALAR,KIDS)
        {
            TYPE = null  ===> (12)
              (was rv2sv)
            FLAGS = (SCALAR,KIDS)
            {
    11          TYPE = gvsv  ===> 12
                FLAGS = (SCALAR)
                GV = main::b
            }
        }

< finish this later >

=head2 Patching

All right, we've now had a look at how to navigate the Perl sources and
some things you'll need to know when fiddling with them. Let's now get
on and create a simple patch. Here's something Larry suggested: if a
C<U> is the first active format during a C<pack>, (for example, 
C<pack "U3C8", @stuff>) then the resulting string should be treated as
UTF8 encoded.

How do we prepare to fix this up? First we locate the code in question -
the C<pack> happens at runtime, so it's going to be in one of the F<pp>
files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be
altering this file, let's copy it to F<pp.c~>.

Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
loop over the pattern, taking each format character in turn into
C<datum_type>. Then for each possible format character, we swallow up
the other arguments in the pattern (a field width, an asterisk, and so
on) and convert the next chunk input into the specified format, adding
it onto the output SV C<cat>.

How do we know if the C<U> is the first format in the C<pat>? Well, if
we have a pointer to the start of C<pat> then, if we see a C<U> we can
test whether we're still at the start of the string. So, here's where
C<pat> is set up:

    STRLEN fromlen;
    register char *pat = SvPVx(*++MARK, fromlen);
    register char *patend = pat + fromlen;
    register I32 len;
    I32 datumtype;
    SV *fromstr;

We'll have another string pointer in there:

    STRLEN fromlen;
    register char *pat = SvPVx(*++MARK, fromlen);
    register char *patend = pat + fromlen;
 +  char *patcopy;
    register I32 len;
    I32 datumtype;
    SV *fromstr;

And just before we start the loop, we'll set C<patcopy> to be the start
of C<pat>:

    items = SP - MARK;
    MARK++;
    sv_setpvn(cat, "", 0);
 +  patcopy = pat;
    while (pat < patend) {

Now if we see a C<U> which was at the start of the string, we turn on
the UTF8 flag for the output SV, C<cat>:

 +  if (datumtype == 'U' && pat==patcopy+1)
 +      SvUTF8_on(cat);
    if (datumtype == '#') {
        while (pat < patend && *pat != '\n')
            pat++;

Remember that it has to be C<patcopy+1> because the first character of
the string is the C<U> which has been swallowed into C<datumtype!>

Oops, we forgot one thing: what if there are spaces at the start of the
pattern? C<pack("  U*", @stuff)> will have C<U> as the first active
character, even though it's not the first thing in the pattern. In this
case, we have to advance C<patcopy> along with C<pat> when we see spaces:

    if (isSPACE(datumtype))
        continue;

needs to become

    if (isSPACE(datumtype)) {
        patcopy++;
        continue;
    }

OK. That's the C part done. Now we must do two additional things before
this patch is ready to go: we've changed the behaviour of Perl, and so
we must document that change. We must also provide some more regression
tests to make sure our patch works and doesn't create a bug somewhere
else along the line.

The regression tests for each operator live in F<t/op/>, and so we make
a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our tests
to the end. First, we'll test that the C<U> does indeed create Unicode
strings:

 print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000);
 print "ok $test\n"; $test++;

Now we'll test that we got that space-at-the-beginning business right:

 print 'not ' unless "1.20.300.4000" eq
                     sprintf "%vd", pack("  U*",1,20,300,4000);
 print "ok $test\n"; $test++;

And finally we'll test that we don't make Unicode strings if C<U> is B<not>
the first active format:

 print 'not ' unless v1.20.300.4000 ne
                     sprintf "%vd", pack("C0U*",1,20,300,4000);
 print "ok $test\n"; $test++;

Mustn't forget to change the number of tests which appears at the top, or
else the automated tester will get confused:

 -print "1..156\n";
 +print "1..159\n";

We now compile up Perl, and run it through the test suite. Our new
tests pass, hooray!

Finally, the documentation. The job is never done until the paperwork is
over, so let's describe the change we've just made. The relevant place
is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert
this text in the description of C<pack>:

 =item *

 If the pattern begins with a C<U>, the resulting string will be treated
 as Unicode-encoded. You can force UTF8 encoding on in a string with an
 initial C<U0>, and the bytes that follow will be interpreted as Unicode
 characters. If you don't want this to happen, you can begin your pattern
 with C<C0> (or anything else) to force Perl not to UTF8 encode your
 string, and then follow this with a C<U*> somewhere in your pattern.

All done. Now let's create the patch. F<Porting/patching.pod> tells us
that if we're making major changes, we should copy the entire directory
to somewhere safe before we begin fiddling, and then do

    diff -ruN old new > patch

However, we know which files we've changed, and we can simply do this:

    diff -u pp.c~             pp.c             >  patch
    diff -u t/op/pack.t~      t/op/pack.t      >> patch
    diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch

We end up with a patch looking a little like this:

    --- pp.c~       Fri Jun 02 04:34:10 2000
    +++ pp.c        Fri Jun 16 11:37:25 2000
    @@ -4375,6 +4375,7 @@
         register I32 items;
         STRLEN fromlen;
         register char *pat = SvPVx(*++MARK, fromlen);
    +    char *patcopy;
         register char *patend = pat + fromlen;
         register I32 len;
         I32 datumtype;
    @@ -4405,6 +4406,7 @@
    ...

And finally, we submit it, with our rationale, to perl5-porters. Job
done!

=head1 EXTERNAL TOOLS FOR DEBUGGING PERL

Sometimes it helps to use external tools while debugging and
testing Perl.  This section tries to guide you through using
some common testing and debugging tools with Perl.  This is
meant as a guide to interfacing these tools with Perl, not
as any kind of guide to the use of the tools themselves.

=head2 Rational Software's Purify

Purify is a commercial tool that is helpful in identifying
memory overruns, wild pointers, memory leaks and other such
badness.  Perl must be compiled in a specific way for
optimal testing with Purify.  Purify is available under
Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.

The only currently known leaks happen when there are
compile-time errors within eval or require.  (Fixing these
is non-trivial, unfortunately, but they must be fixed
eventually.)

=head2 Purify on Unix

On Unix, Purify creates a new Perl binary.  To get the most
benefit out of Purify, you should create the perl to Purify
using:

    sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
     -Uusemymalloc -Dusemultiplicity

where these arguments mean:

=over 4

=item -Accflags=-DPURIFY

Disables Perl's arena memory allocation functions, as well as
forcing use of memory allocation functions derived from the
system malloc.

=item -Doptimize='-g'

Adds debugging information so that you see the exact source
statements where the problem occurs.  Without this flag, all
you will see is the source filename of where the error occurred.

=item -Uusemymalloc

Disable Perl's malloc so that Purify can more closely monitor
allocations and leaks.  Using Perl's malloc will make Purify
report most leaks in the "potential" leaks category.

=item -Dusemultiplicity

Enabling the multiplicity option allows perl to clean up
thoroughly when the interpreter shuts down, which reduces the
number of bogus leak reports from Purify.

=back

Once you've compiled a perl suitable for Purify'ing, then you
can just:

    make pureperl   

which creates a binary named 'pureperl' that has been Purify'ed.
This binary is used in place of the standard 'perl' binary
when you want to debug Perl memory problems.

As an example, to show any memory leaks produced during the
standard Perl testset you would create and run the Purify'ed
perl as:

    make pureperl
    cd t
    ../pureperl -I../lib harness 

which would run Perl on test.pl and report any memory problems.

Purify outputs messages in "Viewer" windows by default.  If
you don't have a windowing environment or if you simply
want the Purify output to unobtrusively go to a log file
instead of to the interactive window, use these following
options to output to the log file "perl.log":

    setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
     -log-file=perl.log -append-logfile=yes"

If you plan to use the "Viewer" windows, then you only need this option:

    setenv PURIFYOPTIONS "-chain-length=25"

=head2 Purify on NT

Purify on Windows NT instruments the Perl binary 'perl.exe'
on the fly.  There are several options in the makefile you
should change to get the most use out of Purify:

=over 4

=item DEFINES

You should add -DPURIFY to the DEFINES line so the DEFINES
line looks something like:

    DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 

to disable Perl's arena memory allocation functions, as
well as to force use of memory allocation functions derived
from the system malloc.

=item USE_MULTI = define

Enabling the multiplicity option allows perl to clean up
thoroughly when the interpreter shuts down, which reduces the
number of bogus leak reports from Purify.

=item #PERL_MALLOC = define

Disable Perl's malloc so that Purify can more closely monitor
allocations and leaks.  Using Perl's malloc will make Purify
report most leaks in the "potential" leaks category.

=item CFG = Debug

Adds debugging information so that you see the exact source
statements where the problem occurs.  Without this flag, all
you will see is the source filename of where the error occurred.

=back

As an example, to show any memory leaks produced during the
standard Perl testset you would create and run Purify as:

    cd win32
    make
    cd ../t
    purify ../perl -I../lib harness 

which would instrument Perl in memory, run Perl on test.pl,
then finally report any memory problems.

=head2 CONCLUSION

We've had a brief look around the Perl source, an overview of the stages
F<perl> goes through when it's running your code, and how to use a
debugger to poke at the Perl guts. We took a very simple problem and
demonstrated how to solve it fully - with documentation, regression
tests, and finally a patch for submission to p5p.  Finally, we talked
about how to use external tools to debug and test Perl.

I'd now suggest you read over those references again, and then, as soon
as possible, get your hands dirty. The best way to learn is by doing,
so: 

=over 3

=item *

Subscribe to perl5-porters, follow the patches and try and understand
them; don't be afraid to ask if there's a portion you're not clear on -
who knows, you may unearth a bug in the patch...

=item *

Keep up to date with the bleeding edge Perl distributions and get
familiar with the changes. Try and get an idea of what areas people are
working on and the changes they're making.

=item *

Do read the README associated with your operating system, e.g. README.aix
on the IBM AIX OS. Don't hesitate to supply patches to that README if
you find anything missing or changed over a new OS release.

=item *

Find an area of Perl that seems interesting to you, and see if you can
work out how it works. Scan through the source, and step over it in the
debugger. Play, poke, investigate, fiddle! You'll probably get to
understand not just your chosen area but a much wider range of F<perl>'s
activity as well, and probably sooner than you'd think.

=back

=over 3

=item I<The Road goes ever on and on, down from the door where it began.>

=back

If you can do these things, you've started on the long road to Perl porting. 
Thanks for wanting to help make Perl better - and happy hacking!

=head1 AUTHOR

This document was written by Nathan Torkington, and is maintained by
the perl5-porters mailing list.

~|xttz~vtpprvttzzvvz~~~zz|zzxtplilnrpiccggiptvx|zx|~vnlnrvpiiiiinliiilnrttrtvz~~zvvxvrpppnnnprtx|zxttz|zvv|~||xtpprvvxxxtrt|~|~|||~xx~||zvtvxvtrrrvxxvrprx||~|zxxx|~||~~~~~||~~~=head1 NAME

perlhist - the Perl history records

=for RCS

=begin RCS

#
# $Id: perlhist.pod,v 1.2 2000/01/24 11:44:47 jhi Exp $
#

=end RCS

=head1 DESCRIPTION

This document aims to record the Perl source code releases.

=head1 INTRODUCTION

Perl history in brief, by Larry Wall:

    Perl 0 introduced Perl to my officemates.
    Perl 1 introduced Perl to the world, and changed /\(...\|...\)/ to
        /(...|...)/.  \(Dan Faigin still hasn't forgiven me. :-\)
    Perl 2 introduced Henry Spencer's regular expression package.
    Perl 3 introduced the ability to handle binary data (embedded nulls).
    Perl 4 introduced the first Camel book.  Really.  We mostly just
        switched version numbers so the book could refer to 4.000.
    Perl 5 introduced everything else, including the ability to
        introduce everything else.

=head1 THE KEEPERS OF THE PUMPKIN

Larry Wall, Andy Dougherty, Tom Christiansen, Charles Bailey, Nick
Ing-Simmons, Chip Salzenberg, Tim Bunce, Malcolm Beattie, Gurusamy
Sarathy, Graham Barr, Jarkko Hietaniemi.

=head2 PUMPKIN?

[from Porting/pumpkin.pod in the Perl source code distribution]

Chip Salzenberg gets credit for that, with a nod to his cow orker,
David Croy.  We had passed around various names (baton, token, hot
potato) but none caught on.  Then, Chip asked:

[begin quote]

   Who has the patch pumpkin?

To explain:  David Croy once told me once that at a previous job,
there was one tape drive and multiple systems that used it for backups.
But instead of some high-tech exclusion software, they used a low-tech
method to prevent multiple simultaneous backups: a stuffed pumpkin.
No one was allowed to make backups unless they had the "backup pumpkin".

[end quote]

The name has stuck.  The holder of the pumpkin is sometimes called
the pumpking (keeping the source afloat?) or the pumpkineer (pulling
the strings?).

=head1 THE RECORDS

 Pump-  Release         Date            Notes
 king                                   (by no means
                                         comprehensive,
                                         see Changes*
                                         for details)
 ===========================================================================

 Larry   0              Classified.     Don't ask.

 Larry   1.000          1987-Dec-18

          1.001..10     1988-Jan-30
          1.011..14     1988-Feb-02

 Larry   2.000          1988-Jun-05

          2.001         1988-Jun-28

 Larry   3.000          1989-Oct-18

          3.001         1989-Oct-26
          3.002..4      1989-Nov-11
          3.005         1989-Nov-18
          3.006..8      1989-Dec-22
          3.009..13     1990-Mar-02
          3.014         1990-Mar-13
          3.015         1990-Mar-14
          3.016..18     1990-Mar-28
          3.019..27     1990-Aug-10     User subs.
          3.028         1990-Aug-14
          3.029..36     1990-Oct-17
          3.037         1990-Oct-20
          3.040         1990-Nov-10
          3.041         1990-Nov-13
          3.042..43     1990-Jan-??
          3.044         1991-Jan-12

 Larry   4.000          1991-Mar-21

          4.001..3      1991-Apr-12
          4.004..9      1991-Jun-07
          4.010         1991-Jun-10
          4.011..18     1991-Nov-05
          4.019         1991-Nov-11     Stable.
          4.020..33     1992-Jun-08
          4.034         1992-Jun-11
          4.035         1992-Jun-23
 Larry    4.036         1993-Feb-05     Very stable.

          5.000alpha1   1993-Jul-31
          5.000alpha2   1993-Aug-16
          5.000alpha3   1993-Oct-10
          5.000alpha4   1993-???-??
          5.000alpha5   1993-???-??
          5.000alpha6   1994-Mar-18
          5.000alpha7   1994-Mar-25
 Andy     5.000alpha8   1994-Apr-04
 Larry    5.000alpha9   1994-May-05     ext appears.
          5.000alpha10  1994-Jun-11
          5.000alpha11  1994-Jul-01
 Andy     5.000a11a     1994-Jul-07     To fit 14.
          5.000a11b     1994-Jul-14
          5.000a11c     1994-Jul-19
          5.000a11d     1994-Jul-22
 Larry    5.000alpha12  1994-Aug-04
 Andy     5.000a12a     1994-Aug-08
          5.000a12b     1994-Aug-15
          5.000a12c     1994-Aug-22
          5.000a12d     1994-Aug-22
          5.000a12e     1994-Aug-22
          5.000a12f     1994-Aug-24
          5.000a12g     1994-Aug-24
          5.000a12h     1994-Aug-24
 Larry    5.000beta1    1994-Aug-30
 Andy     5.000b1a      1994-Sep-06
 Larry    5.000beta2    1994-Sep-14     Core slushified.
 Andy     5.000b2a      1994-Sep-14
          5.000b2b      1994-Sep-17
          5.000b2c      1994-Sep-17
 Larry    5.000beta3    1994-Sep-??
 Andy     5.000b3a      1994-Sep-18
          5.000b3b      1994-Sep-22
          5.000b3c      1994-Sep-23
          5.000b3d      1994-Sep-27
          5.000b3e      1994-Sep-28
          5.000b3f      1994-Sep-30
          5.000b3g      1994-Oct-04
 Andy     5.000b3h      1994-Oct-07
 Larry?   5.000gamma    1994-Oct-13?

 Larry   5.000          1994-Oct-17

 Andy     5.000a        1994-Dec-19
          5.000b        1995-Jan-18
          5.000c        1995-Jan-18
          5.000d        1995-Jan-18
          5.000e        1995-Jan-18
          5.000f        1995-Jan-18
          5.000g        1995-Jan-18
          5.000h        1995-Jan-18
          5.000i        1995-Jan-26
          5.000j        1995-Feb-07
          5.000k        1995-Feb-11
          5.000l        1995-Feb-21
          5.000m        1995-Feb-28
          5.000n        1995-Mar-07
          5.000o        1995-Mar-13?

 Larry   5.001          1995-Mar-13

 Andy     5.001a        1995-Mar-15
          5.001b        1995-Mar-31
          5.001c        1995-Apr-07
          5.001d        1995-Apr-14
          5.001e        1995-Apr-18     Stable.
          5.001f        1995-May-31
          5.001g        1995-May-25
          5.001h        1995-May-25
          5.001i        1995-May-30
          5.001j        1995-Jun-05
          5.001k        1995-Jun-06
          5.001l        1995-Jun-06     Stable.
          5.001m        1995-Jul-02     Very stable.
          5.001n        1995-Oct-31     Very unstable.
          5.002beta1    1995-Nov-21
          5.002b1a      1995-Dec-04
          5.002b1b      1995-Dec-04
          5.002b1c      1995-Dec-04
          5.002b1d      1995-Dec-04
          5.002b1e      1995-Dec-08
          5.002b1f      1995-Dec-08
 Tom      5.002b1g      1995-Dec-21     Doc release.
 Andy     5.002b1h      1996-Jan-05
          5.002b2       1996-Jan-14
 Larry    5.002b3       1996-Feb-02
 Andy     5.002gamma    1996-Feb-11
 Larry    5.002delta    1996-Feb-27

 Larry   5.002          1996-Feb-29     Prototypes.

 Charles  5.002_01      1996-Mar-25

         5.003          1996-Jun-25     Security release.

          5.003_01      1996-Jul-31
 Nick     5.003_02      1996-Aug-10
 Andy     5.003_03      1996-Aug-28
          5.003_04      1996-Sep-02
          5.003_05      1996-Sep-12
          5.003_06      1996-Oct-07
          5.003_07      1996-Oct-10
 Chip     5.003_08      1996-Nov-19
          5.003_09      1996-Nov-26
          5.003_10      1996-Nov-29
          5.003_11      1996-Dec-06
          5.003_12      1996-Dec-19
          5.003_13      1996-Dec-20
          5.003_14      1996-Dec-23
          5.003_15      1996-Dec-23
          5.003_16      1996-Dec-24
          5.003_17      1996-Dec-27
          5.003_18      1996-Dec-31
          5.003_19      1997-Jan-04
          5.003_20      1997-Jan-07
          5.003_21      1997-Jan-15
          5.003_22      1997-Jan-16
          5.003_23      1997-Jan-25
          5.003_24      1997-Jan-29
          5.003_25      1997-Feb-04
          5.003_26      1997-Feb-10
          5.003_27      1997-Feb-18
          5.003_28      1997-Feb-21
          5.003_90      1997-Feb-25     Ramping up to the 5.004 release.
          5.003_91      1997-Mar-01
          5.003_92      1997-Mar-06
          5.003_93      1997-Mar-10
          5.003_94      1997-Mar-22
          5.003_95      1997-Mar-25
          5.003_96      1997-Apr-01
          5.003_97      1997-Apr-03     Fairly widely used.
          5.003_97a     1997-Apr-05
          5.003_97b     1997-Apr-08
          5.003_97c     1997-Apr-10
          5.003_97d     1997-Apr-13
          5.003_97e     1997-Apr-15
          5.003_97f     1997-Apr-17
          5.003_97g     1997-Apr-18
          5.003_97h     1997-Apr-24
          5.003_97i     1997-Apr-25
          5.003_97j     1997-Apr-28
          5.003_98      1997-Apr-30
          5.003_99      1997-May-01
          5.003_99a     1997-May-09
          p54rc1        1997-May-12     Release Candidates.
          p54rc2        1997-May-14

 Chip    5.004          1997-May-15     A major maintenance release.

 Tim      5.004_01      1997-Jun-13     The 5.004 maintenance track.
          5.004_02      1997-Aug-07
          5.004_03      1997-Sep-05
          5.004_04      1997-Oct-15
          5.004m5t1     1998-Mar-04     Maintenance Trials (for 5.004_05).
          5.004_04-m2   1997-May-01
          5.004_04-m3   1998-May-15
          5.004_04-m4   1998-May-19
          5.004_04-MT5  1998-Jul-21
          5.004_04-MT6  1998-Oct-09
          5.004_04-MT7  1998-Nov-22
          5.004_04-MT8  1998-Dec-03
 Chip     5.004_04-MT9  1999-Apr-26
          5.004_05	1999-Apr-29

 Malcolm  5.004_50      1997-Sep-09     The 5.005 development track.
          5.004_51      1997-Oct-02
          5.004_52      1997-Oct-15
          5.004_53      1997-Oct-16
          5.004_54      1997-Nov-14
          5.004_55      1997-Nov-25
          5.004_56      1997-Dec-18
          5.004_57      1998-Feb-03
          5.004_58      1998-Feb-06
          5.004_59      1998-Feb-13
          5.004_60      1998-Feb-20
          5.004_61      1998-Feb-27
          5.004_62      1998-Mar-06
          5.004_63      1998-Mar-17
          5.004_64      1998-Apr-03
          5.004_65      1998-May-15
          5.004_66      1998-May-29
 Sarathy  5.004_67      1998-Jun-15
          5.004_68      1998-Jun-23
          5.004_69      1998-Jun-29
          5.004_70      1998-Jul-06
          5.004_71      1998-Jul-09
          5.004_72      1998-Jul-12
          5.004_73      1998-Jul-13
          5.004_74      1998-Jul-14     5.005 beta candidate.
          5.004_75      1998-Jul-15     5.005 beta1.
          5.004_76      1998-Jul-21     5.005 beta2.
          5.005         1998-Jul-22     Oneperl.

 Sarathy  5.005_01      1998-Jul-27     The 5.005 maintenance track.
          5.005_02-T1   1998-Aug-02
          5.005_02-T2   1998-Aug-05
          5.005_02      1998-Aug-08
 Graham   5.005_03-MT1  1998-Nov-30
          5.005_03-MT2  1999-Jan-04
          5.005_03-MT3  1999-Jan-17
          5.005_03-MT4  1999-Jan-26
          5.005_03-MT5  1999-Jan-28
          5.005_03      1999-Mar-28
 Chip     5.005_04	2000-***-**

 Sarathy  5.005_50      1998-Jul-26     The 5.6 development track.
          5.005_51      1998-Aug-10
          5.005_52      1998-Sep-25
          5.005_53      1998-Oct-31
          5.005_54      1998-Nov-30
          5.005_55      1999-Feb-16
          5.005_56      1999-Mar-01
          5.005_57      1999-May-25
	  5.005_58	1999-Jul-27
	  5.005_59	1999-Aug-02
	  5.005_60	1999-Aug-02
	  5.005_61	1999-Aug-20
	  5.005_62	1999-Oct-15
	  5.005_63	1999-Dec-09
	  5.5.640	2000-Feb-02
	  5.5.650	2000-Feb-08	beta1
	  5.5.660	2000-Feb-22	beta2
	  5.5.670	2000-Feb-29	beta3
	  5.6.0-RC1	2000-Mar-09	release candidate 1
	  5.6.0-RC2	2000-Mar-14	release candidate 2
	  5.6.0-RC3	2000-Mar-21	release candidate 3
	  5.6.0		2000-Mar-22

 Sarathy  5.6.1-TRIAL1  2000-Dec-18	The 5.6 maintenance track.
          5.6.1-TRIAL2  2001-Jan-31
          5.6.1-TRIAL3  2001-Mar-19
          5.6.1-foolish 2001-Apr-01     The "fools-gold" release.
          5.6.1         2001-Apr-08

 Jarkko   5.7.0         2000-Sep-02 	The 5.7 track: Development.

=head2 SELECTED RELEASE SIZES

For example the notation "core: 212  29" in the release 1.000 means that
it had in the core 212 kilobytes, in 29 files.  The "core".."doc" are
explained below.

 release        core       lib         ext        t         doc
 ======================================================================

 1.000           212  29      -   -      -   -     38  51     62   3
 1.014           219  29      -   -      -   -     39  52     68   4
 2.000           309  31      2   3      -   -     55  57     92   4
 2.001           312  31      2   3      -   -     55  57     94   4
 3.000           508  36     24  11      -   -     79  73    156   5
 3.044           645  37     61  20      -   -     90  74    190   6
 4.000           635  37     59  20      -   -     91  75    198   4
 4.019           680  37     85  29      -   -     98  76    199   4
 4.036           709  37     89  30      -   -     98  76    208   5
 5.000alpha2     785  50    114  32      -   -    112  86    209   5
 5.000alpha3     801  50    117  33      -   -    121  87    209   5
 5.000alpha9    1022  56    149  43    116  29    125  90    217   6
 5.000a12h       978  49    140  49    205  46    152  97    228   9
 5.000b3h       1035  53    232  70    216  38    162  94    218  21
 5.000          1038  53    250  76    216  38    154  92    536  62
 5.001m         1071  54    388  82    240  38    159  95    544  29
 5.002          1121  54    661 101    287  43    155  94    847  35
 5.003          1129  54    680 102    291  43    166 100    853  35
 5.003_07       1231  60    748 106    396  53    213 137    976  39
 5.004          1351  60   1230 136    408  51    355 161   1587  55
 5.004_01       1356  60   1258 138    410  51    358 161   1587  55
 5.004_04       1375  60   1294 139    413  51    394 162   1629  55
 5.004_05       1463  60   1435 150    394  50    445 175   1855  59
 5.004_51       1401  61   1260 140    413  53    358 162   1594  56
 5.004_53       1422  62   1295 141    438  70    394 162   1637  56
 5.004_56       1501  66   1301 140    447  74    408 165   1648  57
 5.004_59       1555  72   1317 142    448  74    424 171   1678  58
 5.004_62       1602  77   1327 144    629  92    428 173   1674  58
 5.004_65       1626  77   1358 146    615  92    446 179   1698  60
 5.004_68       1856  74   1382 152    619  92    463 187   1784  60
 5.004_70       1863  75   1456 154    675  92    494 194   1809  60
 5.004_73       1874  76   1467 152    762 102    506 196   1883  61
 5.004_75       1877  76   1467 152    770 103    508 196   1896  62
 5.005          1896  76   1469 152    795 103    509 197   1945  63
 5.005_03	1936  77   1541 153    813 104    551 201   2176  72       
 5.005_50	1969  78   1842 301    795 103    514 198   1948  63
 5.005_53	1999  79   1885 303    806 104    602 224   2002  67
 5.005_56       2086  79   1970 307    866 113    672 238   2221  75

The "core"..."doc" mean the following files from the Perl source code
distribution.  The glob notation ** means recursively, (.) means
regular files.

 core   *.[hcy]
 lib    lib/**/*.p[ml]
 ext    ext/**/*.{[hcyt],xs,pm}
 t      t/**/*(.)
 doc    {README*,INSTALL,*[_.]man{,.?},pod/**/*.pod}

Here are some statistics for the other subdirectories and one file in
the Perl source distribution for somewhat more selected releases.

 ======================================================================
   Legend:  kB   #

            1.014   2.001   3.044   4.000   4.019   4.036

 atarist      -  -    -  -    -  -    -  -    -  -  113 31
 Configure   31  1   37  1   62  1   73  1   83  1   86  1
 eg           -  -   34 28   47 39   47 39   47 39   47 39
 emacs        -  -    -  -    -  -   67  4   67  4   67  4
 h2pl         -  -    -  -   12 12   12 12   12 12   12 12
 hints        -  -    -  -    -  -    -  -    5 42   11 56
 msdos        -  -    -  -   41 13   57 15   58 15   60 15
 os2          -  -    -  -   63 22   81 29   81 29  113 31
 usub         -  -    -  -   21 16   25  7   43  8   43  8
 x2p        103 17  104 17  137 17  147 18  152 19  154 19

 ======================================================================

            5.000a2 5.000a12h 5.000b3h 5.000  5.001m  5.002   5.003

 atarist    113 31  113 31    -  -      -  -    -  -    -  -    -  -
 bench        -  -    0  1    -  -      -  -    -  -    -  -    -  -
 Bugs         2  5   26  1    -  -      -  -    -  -    -  -    -  -
 dlperl      40  5    -  -    -  -      -  -    -  -    -  -    -  -
 do         127 71    -  -    -  -      -  -    -  -    -  -    -  -
 Configure    -  -  153  1  159  1    160  1  180  1  201  1  201  1
 Doc          -  -   26  1   75  7     11  1   11  1    -  -    -  -
 eg          79 58   53 44   51 43     54 44   54 44   54 44   54 44
 emacs       67  4  104  6  104  6    104  1  104  6  108  1  108  1
 h2pl        12 12   12 12   12 12     12 12   12 12   12 12   12 12
 hints       11 56   12 46   18 48     18 48   44 56   73 59   77 60
 msdos       60 15   60 15    -  -      -  -    -  -    -  -    -  -
 os2        113 31  113 31    -  -      -  -    -  -   84 17   56 10
 U            -  -   62  8  112 42      -  -    -  -    -  -    -  -
 usub        43  8    -  -    -  -      -  -    -  -    -  -    -  -
 utils        -  -    -  -    -  -      -  -    -  -   87  7   88  7
 vms          -  -   80  7  123  9    184 15  304 20  500 24  475 26
 x2p        171 22  171 21  162 20    162 20  279 20  280 20  280 20

 ======================================================================

            5.003_07 5.004   5.004_04 5.004_62 5.004_65 5.004_68

 beos         -  -     -  -    -  -     -  -     1   1    1   1
 Configure  217  1   225  1  225  1   240  1   248   1  256   1
 cygwin32     -  -    23  5   23  5    23  5    24   5   24   5
 djgpp        -  -     -  -    -  -    14  5    14   5   14   5
 eg          54 44    81 62   81 62    81 62    81  62   81  62
 emacs      143  1   194  1  204  1   212  2   212   2  212   2
 h2pl        12 12    12 12   12 12    12 12    12  12   12  12
 hints       90 62   129 69  132 71   144 72   151  74  155  74
 os2        117 42   121 42  127 42   127 44   129  44  129  44
 plan9       79 15    82 15   82 15    82 15    82  15   82  15
 Porting     51  1    94  2  109  4   203  6   234   8  241   9
 qnx          -  -     1  2    1  2     1  2     1   2    1   2
 utils       97  7   112  8  118  8   124  8   156   9  159   9
 vms        505 27   518 34  524 34   538 34   569  34  569  34
 win32        -  -   285 33  378 36   470 39   493  39  575  41
 x2p        280 19   281 19  281 19   281 19   282  19  281  19

 ======================================================================

            5.004_70 5.004_73 5.004_75  5.005  5.005_03

 apollo       -   -    -   -    -   -    -   -    0   1
 beos         1   1    1   1    1   1    1   1    1   1
 Configure  256   1  256   1  264   1  264   1  270   1
 cygwin32    24   5   24   5   24   5   24   5   24   5  
 djgpp       14   5   14   5   14   5   14   5	 15   5
 eg          86  65   86  65   86  65   86  65	 86  65
 emacs      262   2  262   2  262   2  262   2	274   2
 h2pl        12  12   12  12   12  12   12  12	 12  12
 hints      157  74  157  74  159  74  160  74	179  77
 mint         -   -    -   -    -   -    -   -	  4   7
 mpeix        -   -    -   -    5   3    5   3	  5   3
 os2        129  44  139  44  142  44  143  44	148  44
 plan9       82  15   82  15   82  15   82  15	 82  15
 Porting    241   9  253   9  259  10  264  12	272  13
 qnx          1   2    1   2    1   2    1   2	  1   2
 utils      160   9  160   9  160   9  160   9	164   9
 vms        570  34  572  34  573  34  575  34	583  34
 vos          -   -    -   -    -   -    -   -	156  10
 win32      577  41  585  41  585  41  587  41	600  42
 x2p        281  19  281  19  281  19  281  19	281  19

=head2 SELECTED PATCH SIZES

The "diff lines kb" means that for example the patch 5.003_08, to be
applied on top of the 5.003_07 (or whatever was before the 5.003_08)
added lines for 110 kilobytes, it removed lines for 19 kilobytes, and
changed lines for 424 kilobytes.  Just the lines themselves are
counted, not their context.  The "+ - !" become from the diff(1)
context diff output format.

 Pump-  Release         Date           diff lines kB
 king                                  -------------
                                          +   -   !
 ===========================================================================

 Chip     5.003_08      1996-Nov-19     110  19 424
          5.003_09      1996-Nov-26      38   9 248
          5.003_10      1996-Nov-29      29   2  27
          5.003_11      1996-Dec-06      73  12 165
          5.003_12      1996-Dec-19     275   6 436
          5.003_13      1996-Dec-20      95   1  56
          5.003_14      1996-Dec-23      23   7 333
          5.003_15      1996-Dec-23       0   0   1
          5.003_16      1996-Dec-24      12   3  50
          5.003_17      1996-Dec-27      19   1  14
          5.003_18      1996-Dec-31      21   1  32
          5.003_19      1997-Jan-04      80   3  85
          5.003_20      1997-Jan-07      18   1 146
          5.003_21      1997-Jan-15      38  10 221
          5.003_22      1997-Jan-16       4   0  18
          5.003_23      1997-Jan-25      71  15 119
          5.003_24      1997-Jan-29     426   1  20
          5.003_25      1997-Feb-04      21   8 169
          5.003_26      1997-Feb-10      16   1  15
          5.003_27      1997-Feb-18      32  10  38
          5.003_28      1997-Feb-21      58   4  66
          5.003_90      1997-Feb-25      22   2  34
          5.003_91      1997-Mar-01      37   1  39
          5.003_92      1997-Mar-06      16   3  69
          5.003_93      1997-Mar-10      12   3  15
          5.003_94      1997-Mar-22     407   7 200
          5.003_95      1997-Mar-25      41   1  37
          5.003_96      1997-Apr-01     283   5 261
          5.003_97      1997-Apr-03      13   2  34
          5.003_97a     1997-Apr-05      57   1  27
          5.003_97b     1997-Apr-08      14   1  20
          5.003_97c     1997-Apr-10      20   1  16
          5.003_97d     1997-Apr-13       8   0  16
          5.003_97e     1997-Apr-15      15   4  46
          5.003_97f     1997-Apr-17       7   1  33
          5.003_97g     1997-Apr-18       6   1  42
          5.003_97h     1997-Apr-24      23   3  68
          5.003_97i     1997-Apr-25      23   1  31
          5.003_97j     1997-Apr-28      36   1  49
          5.003_98      1997-Apr-30     171  12 539
          5.003_99      1997-May-01       6   0   7
          5.003_99a     1997-May-09      36   2  61
          p54rc1        1997-May-12       8   1  11
          p54rc2        1997-May-14       6   0  40

        5.004           1997-May-15       4   0   4

 Tim      5.004_01      1997-Jun-13     222  14  57
          5.004_02      1997-Aug-07     112  16 119
          5.004_03      1997-Sep-05     109   0  17
          5.004_04      1997-Oct-15      66   8 173

=head1 THE KEEPERS OF THE RECORDS

Jarkko Hietaniemi <F<jhi@iki.fi>>.

Thanks to the collective memory of the Perlfolk.  In addition to the
Keepers of the Pumpkin also Alan Champion, Andreas Knig, John
Macdonald, Matthias Neeracher, Jeff Okamoto, Michael Peppler,
Randal Schwartz, and Paul D. Smith sent corrections and additions.

=cut
 #U CSR5A !!3B2Q$!e"AU  D"CR61"!"5D2"51VB"3QQ52ة"S3B4"!(Uc  ca41Ȫ"	Ț˭ﻺښʺ"ڜϫ˚ﻺʫ                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlintern - autogenerated documentation of purely B<internal> 
		 Perl functions

=head1 DESCRIPTION

This file is the autogenerated documentation of functions in the 
Perl interpreter that are documented using Perl's internal documentation
format but are not marked as part of the Perl API. In other words, 
B<they are not for use in extensions>!

=over 8

=item is_gv_magical

Returns C<TRUE> if given the name of a magical GV.

Currently only useful internally when determining if a GV should be
created even in rvalue contexts.

C<flags> is not used at present but available for future extension to
allow selecting particular classes of magical variable.

	bool	is_gv_magical(char *name, STRLEN len, U32 flags)

=for hackers
Found in file gv.c

=item LVRET

True if this op will be the return value of an lvalue subroutine

=for hackers
Found in file pp.h

=item PL_DBsingle

When Perl is run in debugging mode, with the B<-d> switch, this SV is a
boolean which indicates whether subs are being single-stepped. 
Single-stepping is automatically turned on after every step.  This is the C
variable which corresponds to Perl's $DB::single variable.  See
C<PL_DBsub>.

	SV *	PL_DBsingle

=for hackers
Found in file intrpvar.h

=item PL_DBsub

When Perl is run in debugging mode, with the B<-d> switch, this GV contains
the SV which holds the name of the sub being debugged.  This is the C
variable which corresponds to Perl's $DB::sub variable.  See
C<PL_DBsingle>.

	GV *	PL_DBsub

=for hackers
Found in file intrpvar.h

=item PL_DBtrace

Trace variable used when Perl is run in debugging mode, with the B<-d>
switch.  This is the C variable which corresponds to Perl's $DB::trace
variable.  See C<PL_DBsingle>.

	SV *	PL_DBtrace

=for hackers
Found in file intrpvar.h

=item PL_dowarn

The C variable which corresponds to Perl's $^W warning variable.

	bool	PL_dowarn

=for hackers
Found in file intrpvar.h

=item PL_last_in_gv

The GV which was last used for a filehandle input operation. (C<< <FH> >>)

	GV*	PL_last_in_gv

=for hackers
Found in file thrdvar.h

=item PL_ofs_sv

The output field separator - C<$,> in Perl space.

	SV*	PL_ofs_sv

=for hackers
Found in file thrdvar.h

=item PL_rs

The input record separator - C<$/> in Perl space.

	SV*	PL_rs

=for hackers
Found in file thrdvar.h

=back

=head1 AUTHORS

The autodocumentation system was originally added to the Perl core by 
Benjamin Stuhl. Documentation is by whoever was kind enough to 
document their functions.

=head1 SEE ALSO

perlguts(1), perlapi(1)

B:$㶳0"C                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlipc - Perl interprocess communication (signals, fifos, pipes, safe subprocesses, sockets, and semaphores)

=head1 DESCRIPTION

The basic IPC facilities of Perl are built out of the good old Unix
signals, named pipes, pipe opens, the Berkeley socket routines, and SysV
IPC calls.  Each is used in slightly different situations.

=head1 Signals

Perl uses a simple signal handling model: the %SIG hash contains names or
references of user-installed signal handlers.  These handlers will be called
with an argument which is the name of the signal that triggered it.  A
signal may be generated intentionally from a particular keyboard sequence like
control-C or control-Z, sent to you from another process, or
triggered automatically by the kernel when special events transpire, like
a child process exiting, your process running out of stack space, or
hitting file size limit.

For example, to trap an interrupt signal, set up a handler like this.
Do as little as you possibly can in your handler; notice how all we do is
set a global variable and then raise an exception.  That's because on most
systems, libraries are not re-entrant; particularly, memory allocation and
I/O routines are not.  That means that doing nearly I<anything> in your
handler could in theory trigger a memory fault and subsequent core dump.

    sub catch_zap {
	my $signame = shift;
	$shucks++;
	die "Somebody sent me a SIG$signame";
    }
    $SIG{INT} = 'catch_zap';  # could fail in modules
    $SIG{INT} = \&catch_zap;  # best strategy

The names of the signals are the ones listed out by C<kill -l> on your
system, or you can retrieve them from the Config module.  Set up an
@signame list indexed by number to get the name and a %signo table
indexed by name to get the number:

    use Config;
    defined $Config{sig_name} || die "No sigs?";
    foreach $name (split(' ', $Config{sig_name})) {
	$signo{$name} = $i;
	$signame[$i] = $name;
	$i++;
    }

So to check whether signal 17 and SIGALRM were the same, do just this:

    print "signal #17 = $signame[17]\n";
    if ($signo{ALRM}) {
	print "SIGALRM is $signo{ALRM}\n";
    }

You may also choose to assign the strings C<'IGNORE'> or C<'DEFAULT'> as
the handler, in which case Perl will try to discard the signal or do the
default thing.

On most Unix platforms, the C<CHLD> (sometimes also known as C<CLD>) signal
has special behavior with respect to a value of C<'IGNORE'>.
Setting C<$SIG{CHLD}> to C<'IGNORE'> on such a platform has the effect of
not creating zombie processes when the parent process fails to C<wait()>
on its child processes (i.e. child processes are automatically reaped).
Calling C<wait()> with C<$SIG{CHLD}> set to C<'IGNORE'> usually returns
C<-1> on such platforms.

Some signals can be neither trapped nor ignored, such as
the KILL and STOP (but not the TSTP) signals.  One strategy for
temporarily ignoring signals is to use a local() statement, which will be
automatically restored once your block is exited.  (Remember that local()
values are "inherited" by functions called from within that block.)

    sub precious {
	local $SIG{INT} = 'IGNORE';
	&more_functions;
    }
    sub more_functions {
	# interrupts still ignored, for now...
    }

Sending a signal to a negative process ID means that you send the signal
to the entire Unix process-group.  This code sends a hang-up signal to all
processes in the current process group (and sets $SIG{HUP} to IGNORE so
it doesn't kill itself):

    {
	local $SIG{HUP} = 'IGNORE';
	kill HUP => -$$;
	# snazzy writing of: kill('HUP', -$$)
    }

Another interesting signal to send is signal number zero.  This doesn't
actually affect another process, but instead checks whether it's alive
or has changed its UID.

    unless (kill 0 => $kid_pid) {
	warn "something wicked happened to $kid_pid";
    }

You might also want to employ anonymous functions for simple signal
handlers:

    $SIG{INT} = sub { die "\nOutta here!\n" };

But that will be problematic for the more complicated handlers that need
to reinstall themselves.  Because Perl's signal mechanism is currently
based on the signal(3) function from the C library, you may sometimes be so
misfortunate as to run on systems where that function is "broken", that
is, it behaves in the old unreliable SysV way rather than the newer, more
reasonable BSD and POSIX fashion.  So you'll see defensive people writing
signal handlers like this:

    sub REAPER {
	$waitedpid = wait;
	# loathe sysV: it makes us not only reinstate
	# the handler, but place it after the wait
	$SIG{CHLD} = \&REAPER;
    }
    $SIG{CHLD} = \&REAPER;
    # now do something that forks...

or even the more elaborate:

    use POSIX ":sys_wait_h";
    sub REAPER {
	my $child;
        while (($child = waitpid(-1,WNOHANG)) > 0) {
	    $Kid_Status{$child} = $?;
	}
	$SIG{CHLD} = \&REAPER;  # still loathe sysV
    }
    $SIG{CHLD} = \&REAPER;
    # do something that forks...

Signal handling is also used for timeouts in Unix,   While safely
protected within an C<eval{}> block, you set a signal handler to trap
alarm signals and then schedule to have one delivered to you in some
number of seconds.  Then try your blocking operation, clearing the alarm
when it's done but not before you've exited your C<eval{}> block.  If it
goes off, you'll use die() to jump out of the block, much as you might
using longjmp() or throw() in other languages.

Here's an example:

    eval {
        local $SIG{ALRM} = sub { die "alarm clock restart" };
        alarm 10;
        flock(FH, 2);   # blocking write lock
        alarm 0;
    };
    if ($@ and $@ !~ /alarm clock restart/) { die }

If the operation being timed out is system() or qx(), this technique
is liable to generate zombies.    If this matters to you, you'll
need to do your own fork() and exec(), and kill the errant child process.

For more complex signal handling, you might see the standard POSIX
module.  Lamentably, this is almost entirely undocumented, but
the F<t/lib/posix.t> file from the Perl source distribution has some
examples in it.

=head1 Named Pipes

A named pipe (often referred to as a FIFO) is an old Unix IPC
mechanism for processes communicating on the same machine.  It works
just like a regular, connected anonymous pipes, except that the
processes rendezvous using a filename and don't have to be related.

To create a named pipe, use the Unix command mknod(1) or on some
systems, mkfifo(1).  These may not be in your normal path.

    # system return val is backwards, so && not ||
    #
    $ENV{PATH} .= ":/etc:/usr/etc";
    if  (      system('mknod',  $path, 'p')
	    && system('mkfifo', $path) )
    {
	die "mk{nod,fifo} $path failed";
    }


A fifo is convenient when you want to connect a process to an unrelated
one.  When you open a fifo, the program will block until there's something
on the other end.

For example, let's say you'd like to have your F<.signature> file be a
named pipe that has a Perl program on the other end.  Now every time any
program (like a mailer, news reader, finger program, etc.) tries to read
from that file, the reading program will block and your program will
supply the new signature.  We'll use the pipe-checking file test B<-p>
to find out whether anyone (or anything) has accidentally removed our fifo.

    chdir; # go home
    $FIFO = '.signature';
    $ENV{PATH} .= ":/etc:/usr/games";

    while (1) {
	unless (-p $FIFO) {
	    unlink $FIFO;
	    system('mknod', $FIFO, 'p')
		&& die "can't mknod $FIFO: $!";
	}

	# next line blocks until there's a reader
	open (FIFO, "> $FIFO") || die "can't write $FIFO: $!";
	print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
	close FIFO;
	sleep 2;    # to avoid dup signals
    }

=head2 WARNING

By installing Perl code to deal with signals, you're exposing yourself
to danger from two things.  First, few system library functions are
re-entrant.  If the signal interrupts while Perl is executing one function
(like malloc(3) or printf(3)), and your signal handler then calls the
same function again, you could get unpredictable behavior--often, a
core dump.  Second, Perl isn't itself re-entrant at the lowest levels.
If the signal interrupts Perl while Perl is changing its own internal
data structures, similarly unpredictable behaviour may result.

There are two things you can do, knowing this: be paranoid or be
pragmatic.  The paranoid approach is to do as little as possible in your
signal handler.  Set an existing integer variable that already has a
value, and return.  This doesn't help you if you're in a slow system call,
which will just restart.  That means you have to C<die> to longjump(3) out
of the handler.  Even this is a little cavalier for the true paranoiac,
who avoids C<die> in a handler because the system I<is> out to get you.
The pragmatic approach is to say ``I know the risks, but prefer the
convenience'', and to do anything you want in your signal handler,
prepared to clean up core dumps now and again.

To forbid signal handlers altogether would bars you from
many interesting programs, including virtually everything in this manpage,
since you could no longer even write SIGCHLD handlers.  


=head1 Using open() for IPC

Perl's basic open() statement can also be used for unidirectional interprocess
communication by either appending or prepending a pipe symbol to the second
argument to open().  Here's how to start something up in a child process you
intend to write to:

    open(SPOOLER, "| cat -v | lpr -h 2>/dev/null")
		    || die "can't fork: $!";
    local $SIG{PIPE} = sub { die "spooler pipe broke" };
    print SPOOLER "stuff\n";
    close SPOOLER || die "bad spool: $! $?";

And here's how to start up a child process you intend to read from:

    open(STATUS, "netstat -an 2>&1 |")
		    || die "can't fork: $!";
    while (<STATUS>) {
	next if /^(tcp|udp)/;
	print;
    }
    close STATUS || die "bad netstat: $! $?";

If one can be sure that a particular program is a Perl script that is
expecting filenames in @ARGV, the clever programmer can write something
like this:

    % program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile

and irrespective of which shell it's called from, the Perl program will
read from the file F<f1>, the process F<cmd1>, standard input (F<tmpfile>
in this case), the F<f2> file, the F<cmd2> command, and finally the F<f3>
file.  Pretty nifty, eh?

You might notice that you could use backticks for much the
same effect as opening a pipe for reading:

    print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`;
    die "bad netstat" if $?;

While this is true on the surface, it's much more efficient to process the
file one line or record at a time because then you don't have to read the
whole thing into memory at once.  It also gives you finer control of the
whole process, letting you to kill off the child process early if you'd
like.

Be careful to check both the open() and the close() return values.  If
you're I<writing> to a pipe, you should also trap SIGPIPE.  Otherwise,
think of what happens when you start up a pipe to a command that doesn't
exist: the open() will in all likelihood succeed (it only reflects the
fork()'s success), but then your output will fail--spectacularly.  Perl
can't know whether the command worked because your command is actually
running in a separate process whose exec() might have failed.  Therefore,
while readers of bogus commands return just a quick end of file, writers
to bogus command will trigger a signal they'd better be prepared to
handle.  Consider:

    open(FH, "|bogus")	or die "can't fork: $!";
    print FH "bang\n"	or die "can't write: $!";
    close FH		or die "can't close: $!";

That won't blow up until the close, and it will blow up with a SIGPIPE.
To catch it, you could use this:

    $SIG{PIPE} = 'IGNORE';
    open(FH, "|bogus")  or die "can't fork: $!";
    print FH "bang\n"   or die "can't write: $!";
    close FH            or die "can't close: status=$?";

=head2 Filehandles

Both the main process and any child processes it forks share the same
STDIN, STDOUT, and STDERR filehandles.  If both processes try to access
them at once, strange things can happen.  You may also want to close
or reopen the filehandles for the child.  You can get around this by
opening your pipe with open(), but on some systems this means that the
child process cannot outlive the parent.

=head2 Background Processes

You can run a command in the background with:

    system("cmd &");

The command's STDOUT and STDERR (and possibly STDIN, depending on your
shell) will be the same as the parent's.  You won't need to catch
SIGCHLD because of the double-fork taking place (see below for more
details).

=head2 Complete Dissociation of Child from Parent

In some cases (starting server processes, for instance) you'll want to
completely dissociate the child process from the parent.  This is
often called daemonization.  A well behaved daemon will also chdir()
to the root directory (so it doesn't prevent unmounting the filesystem
containing the directory from which it was launched) and redirect its
standard file descriptors from and to F</dev/null> (so that random
output doesn't wind up on the user's terminal).

    use POSIX 'setsid';

    sub daemonize {
	chdir '/'		or die "Can't chdir to /: $!";
	open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
	open STDOUT, '>/dev/null'
				or die "Can't write to /dev/null: $!";
	defined(my $pid = fork)	or die "Can't fork: $!";
	exit if $pid;
	setsid			or die "Can't start a new session: $!";
	open STDERR, '>&STDOUT'	or die "Can't dup stdout: $!";
    }

The fork() has to come before the setsid() to ensure that you aren't a
process group leader (the setsid() will fail if you are).  If your
system doesn't have the setsid() function, open F</dev/tty> and use the
C<TIOCNOTTY> ioctl() on it instead.  See L<tty(4)> for details.

Non-Unix users should check their Your_OS::Process module for other
solutions.

=head2 Safe Pipe Opens

Another interesting approach to IPC is making your single program go
multiprocess and communicate between (or even amongst) yourselves.  The
open() function will accept a file argument of either C<"-|"> or C<"|-">
to do a very interesting thing: it forks a child connected to the
filehandle you've opened.  The child is running the same program as the
parent.  This is useful for safely opening a file when running under an
assumed UID or GID, for example.  If you open a pipe I<to> minus, you can
write to the filehandle you opened and your kid will find it in his
STDIN.  If you open a pipe I<from> minus, you can read from the filehandle
you opened whatever your kid writes to his STDOUT.

    use English;
    my $sleep_count = 0;

    do {
	$pid = open(KID_TO_WRITE, "|-");
	unless (defined $pid) {
	    warn "cannot fork: $!";
	    die "bailing out" if $sleep_count++ > 6;
	    sleep 10;
	}
    } until defined $pid;

    if ($pid) {  # parent
	print KID_TO_WRITE @some_data;
	close(KID_TO_WRITE) || warn "kid exited $?";
    } else {     # child
	($EUID, $EGID) = ($UID, $GID); # suid progs only
	open (FILE, "> /safe/file")
	    || die "can't open /safe/file: $!";
	while (<STDIN>) {
	    print FILE; # child's STDIN is parent's KID
	}
	exit;  # don't forget this
    }

Another common use for this construct is when you need to execute
something without the shell's interference.  With system(), it's
straightforward, but you can't use a pipe open or backticks safely.
That's because there's no way to stop the shell from getting its hands on
your arguments.   Instead, use lower-level control to call exec() directly.

Here's a safe backtick or pipe open for read:

    # add error processing as above
    $pid = open(KID_TO_READ, "-|");

    if ($pid) {   # parent
	while (<KID_TO_READ>) {
	    # do something interesting
	}
	close(KID_TO_READ) || warn "kid exited $?";

    } else {      # child
	($EUID, $EGID) = ($UID, $GID); # suid only
	exec($program, @options, @args)
	    || die "can't exec program: $!";
	# NOTREACHED
    }


And here's a safe pipe open for writing:

    # add error processing as above
    $pid = open(KID_TO_WRITE, "|-");
    $SIG{ALRM} = sub { die "whoops, $program pipe broke" };

    if ($pid) {  # parent
	for (@data) {
	    print KID_TO_WRITE;
	}
	close(KID_TO_WRITE) || warn "kid exited $?";

    } else {     # child
	($EUID, $EGID) = ($UID, $GID);
	exec($program, @options, @args)
	    || die "can't exec program: $!";
	# NOTREACHED
    }

Note that these operations are full Unix forks, which means they may not be
correctly implemented on alien systems.  Additionally, these are not true
multithreading.  If you'd like to learn more about threading, see the
F<modules> file mentioned below in the SEE ALSO section.

=head2 Bidirectional Communication with Another Process

While this works reasonably well for unidirectional communication, what
about bidirectional communication?  The obvious thing you'd like to do
doesn't actually work:

    open(PROG_FOR_READING_AND_WRITING, "| some program |")

and if you forget to use the C<use warnings> pragma or the B<-w> flag,
then you'll miss out entirely on the diagnostic message:

    Can't do bidirectional pipe at -e line 1.

If you really want to, you can use the standard open2() library function
to catch both ends.  There's also an open3() for tridirectional I/O so you
can also catch your child's STDERR, but doing so would then require an
awkward select() loop and wouldn't allow you to use normal Perl input
operations.

If you look at its source, you'll see that open2() uses low-level
primitives like Unix pipe() and exec() calls to create all the connections.
While it might have been slightly more efficient by using socketpair(), it
would have then been even less portable than it already is.  The open2()
and open3() functions are  unlikely to work anywhere except on a Unix
system or some other one purporting to be POSIX compliant.

Here's an example of using open2():

    use FileHandle;
    use IPC::Open2;
    $pid = open2(*Reader, *Writer, "cat -u -n" );
    print Writer "stuff\n";
    $got = <Reader>;

The problem with this is that Unix buffering is really going to
ruin your day.  Even though your C<Writer> filehandle is auto-flushed,
and the process on the other end will get your data in a timely manner,
you can't usually do anything to force it to give it back to you
in a similarly quick fashion.  In this case, we could, because we
gave I<cat> a B<-u> flag to make it unbuffered.  But very few Unix
commands are designed to operate over pipes, so this seldom works
unless you yourself wrote the program on the other end of the
double-ended pipe.

A solution to this is the nonstandard F<Comm.pl> library.  It uses
pseudo-ttys to make your program behave more reasonably:

    require 'Comm.pl';
    $ph = open_proc('cat -n');
    for (1..10) {
	print $ph "a line\n";
	print "got back ", scalar <$ph>;
    }

This way you don't have to have control over the source code of the
program you're using.  The F<Comm> library also has expect()
and interact() functions.  Find the library (and we hope its
successor F<IPC::Chat>) at your nearest CPAN archive as detailed
in the SEE ALSO section below.

The newer Expect.pm module from CPAN also addresses this kind of thing.
This module requires two other modules from CPAN: IO::Pty and IO::Stty.
It sets up a pseudo-terminal to interact with programs that insist on
using talking to the terminal device driver.  If your system is 
amongst those supported, this may be your best bet.

=head2 Bidirectional Communication with Yourself

If you want, you may make low-level pipe() and fork()
to stitch this together by hand.  This example only
talks to itself, but you could reopen the appropriate
handles to STDIN and STDOUT and call other processes.

    #!/usr/bin/perl -w
    # pipe1 - bidirectional communication using two pipe pairs
    #         designed for the socketpair-challenged
    use IO::Handle;	# thousands of lines just for autoflush :-(
    pipe(PARENT_RDR, CHILD_WTR);		# XXX: failure?
    pipe(CHILD_RDR,  PARENT_WTR);		# XXX: failure?
    CHILD_WTR->autoflush(1);
    PARENT_WTR->autoflush(1);

    if ($pid = fork) {
	close PARENT_RDR; close PARENT_WTR;
	print CHILD_WTR "Parent Pid $$ is sending this\n";
	chomp($line = <CHILD_RDR>);
	print "Parent Pid $$ just read this: `$line'\n";
	close CHILD_RDR; close CHILD_WTR;
	waitpid($pid,0);
    } else {
	die "cannot fork: $!" unless defined $pid;
	close CHILD_RDR; close CHILD_WTR;
	chomp($line = <PARENT_RDR>);
	print "Child Pid $$ just read this: `$line'\n";
	print PARENT_WTR "Child Pid $$ is sending this\n";
	close PARENT_RDR; close PARENT_WTR;
	exit;
    }

But you don't actually have to make two pipe calls.  If you 
have the socketpair() system call, it will do this all for you.

    #!/usr/bin/perl -w
    # pipe2 - bidirectional communication using socketpair
    #   "the best ones always go both ways"

    use Socket;
    use IO::Handle;	# thousands of lines just for autoflush :-(
    # We say AF_UNIX because although *_LOCAL is the
    # POSIX 1003.1g form of the constant, many machines
    # still don't have it.
    socketpair(CHILD, PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
				or  die "socketpair: $!";

    CHILD->autoflush(1);
    PARENT->autoflush(1);

    if ($pid = fork) {
	close PARENT;
	print CHILD "Parent Pid $$ is sending this\n";
	chomp($line = <CHILD>);
	print "Parent Pid $$ just read this: `$line'\n";
	close CHILD;
	waitpid($pid,0);
    } else {
	die "cannot fork: $!" unless defined $pid;
	close CHILD;
	chomp($line = <PARENT>);
	print "Child Pid $$ just read this: `$line'\n";
	print PARENT "Child Pid $$ is sending this\n";
	close PARENT;
	exit;
    }

=head1 Sockets: Client/Server Communication

While not limited to Unix-derived operating systems (e.g., WinSock on PCs
provides socket support, as do some VMS libraries), you may not have
sockets on your system, in which case this section probably isn't going to do
you much good.  With sockets, you can do both virtual circuits (i.e., TCP
streams) and datagrams (i.e., UDP packets).  You may be able to do even more
depending on your system.

The Perl function calls for dealing with sockets have the same names as
the corresponding system calls in C, but their arguments tend to differ
for two reasons: first, Perl filehandles work differently than C file
descriptors.  Second, Perl already knows the length of its strings, so you
don't need to pass that information.

One of the major problems with old socket code in Perl was that it used
hard-coded values for some of the constants, which severely hurt
portability.  If you ever see code that does anything like explicitly
setting C<$AF_INET = 2>, you know you're in for big trouble:  An
immeasurably superior approach is to use the C<Socket> module, which more
reliably grants access to various constants and functions you'll need.

If you're not writing a server/client for an existing protocol like
NNTP or SMTP, you should give some thought to how your server will
know when the client has finished talking, and vice-versa.  Most
protocols are based on one-line messages and responses (so one party
knows the other has finished when a "\n" is received) or multi-line
messages and responses that end with a period on an empty line
("\n.\n" terminates a message/response).

=head2 Internet Line Terminators

The Internet line terminator is "\015\012".  Under ASCII variants of
Unix, that could usually be written as "\r\n", but under other systems,
"\r\n" might at times be "\015\015\012", "\012\012\015", or something
completely different.  The standards specify writing "\015\012" to be
conformant (be strict in what you provide), but they also recommend
accepting a lone "\012" on input (but be lenient in what you require).
We haven't always been very good about that in the code in this manpage,
but unless you're on a Mac, you'll probably be ok.

=head2 Internet TCP Clients and Servers

Use Internet-domain sockets when you want to do client-server
communication that might extend to machines outside of your own system.

Here's a sample TCP client using Internet-domain sockets:

    #!/usr/bin/perl -w
    use strict;
    use Socket;
    my ($remote,$port, $iaddr, $paddr, $proto, $line);

    $remote  = shift || 'localhost';
    $port    = shift || 2345;  # random port
    if ($port =~ /\D/) { $port = getservbyname($port, 'tcp') }
    die "No port" unless $port;
    $iaddr   = inet_aton($remote) 		|| die "no host: $remote";
    $paddr   = sockaddr_in($port, $iaddr);

    $proto   = getprotobyname('tcp');
    socket(SOCK, PF_INET, SOCK_STREAM, $proto)	|| die "socket: $!";
    connect(SOCK, $paddr)    || die "connect: $!";
    while (defined($line = <SOCK>)) {
	print $line;
    }

    close (SOCK)	    || die "close: $!";
    exit;

And here's a corresponding server to go along with it.  We'll
leave the address as INADDR_ANY so that the kernel can choose
the appropriate interface on multihomed hosts.  If you want sit
on a particular interface (like the external side of a gateway
or firewall machine), you should fill this in with your real address
instead.

    #!/usr/bin/perl -Tw
    use strict;
    BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
    use Socket;
    use Carp;
    my $EOL = "\015\012";

    sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }

    my $port = shift || 2345;
    my $proto = getprotobyname('tcp');

    ($port) = $port =~ /^(\d+)$/                        or die "invalid port";

    socket(Server, PF_INET, SOCK_STREAM, $proto)	|| die "socket: $!";
    setsockopt(Server, SOL_SOCKET, SO_REUSEADDR,
					pack("l", 1)) 	|| die "setsockopt: $!";
    bind(Server, sockaddr_in($port, INADDR_ANY))	|| die "bind: $!";
    listen(Server,SOMAXCONN) 				|| die "listen: $!";

    logmsg "server started on port $port";

    my $paddr;

    $SIG{CHLD} = \&REAPER;

    for ( ; $paddr = accept(Client,Server); close Client) {
	my($port,$iaddr) = sockaddr_in($paddr);
	my $name = gethostbyaddr($iaddr,AF_INET);

	logmsg "connection from $name [",
		inet_ntoa($iaddr), "]
		at port $port";

	print Client "Hello there, $name, it's now ",
			scalar localtime, $EOL;
    }

And here's a multithreaded version.  It's multithreaded in that
like most typical servers, it spawns (forks) a slave server to
handle the client request so that the master server can quickly
go back to service a new client.

    #!/usr/bin/perl -Tw
    use strict;
    BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
    use Socket;
    use Carp;
    my $EOL = "\015\012";

    sub spawn;  # forward declaration
    sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }

    my $port = shift || 2345;
    my $proto = getprotobyname('tcp');

    ($port) = $port =~ /^(\d+)$/                        or die "invalid port";

    socket(Server, PF_INET, SOCK_STREAM, $proto)	|| die "socket: $!";
    setsockopt(Server, SOL_SOCKET, SO_REUSEADDR,
					pack("l", 1)) 	|| die "setsockopt: $!";
    bind(Server, sockaddr_in($port, INADDR_ANY))	|| die "bind: $!";
    listen(Server,SOMAXCONN) 				|| die "listen: $!";

    logmsg "server started on port $port";

    my $waitedpid = 0;
    my $paddr;

    sub REAPER {
	$waitedpid = wait;
	$SIG{CHLD} = \&REAPER;  # loathe sysV
	logmsg "reaped $waitedpid" . ($? ? " with exit $?" : '');
    }

    $SIG{CHLD} = \&REAPER;

    for ( $waitedpid = 0;
	  ($paddr = accept(Client,Server)) || $waitedpid;
	  $waitedpid = 0, close Client)
    {
	next if $waitedpid and not $paddr;
	my($port,$iaddr) = sockaddr_in($paddr);
	my $name = gethostbyaddr($iaddr,AF_INET);

	logmsg "connection from $name [",
		inet_ntoa($iaddr), "]
		at port $port";

	spawn sub {
	    $|=1;
	    print "Hello there, $name, it's now ", scalar localtime, $EOL;
	    exec '/usr/games/fortune'		# XXX: `wrong' line terminators
		or confess "can't exec fortune: $!";
	};

    }

    sub spawn {
	my $coderef = shift;

	unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') {
	    confess "usage: spawn CODEREF";
	}

	my $pid;
	if (!defined($pid = fork)) {
	    logmsg "cannot fork: $!";
	    return;
	} elsif ($pid) {
	    logmsg "begat $pid";
	    return; # I'm the parent
	}
	# else I'm the child -- go spawn

	open(STDIN,  "<&Client")   || die "can't dup client to stdin";
	open(STDOUT, ">&Client")   || die "can't dup client to stdout";
	## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
	exit &$coderef();
    }

This server takes the trouble to clone off a child version via fork() for
each incoming request.  That way it can handle many requests at once,
which you might not always want.  Even if you don't fork(), the listen()
will allow that many pending connections.  Forking servers have to be
particularly careful about cleaning up their dead children (called
"zombies" in Unix parlance), because otherwise you'll quickly fill up your
process table.

We suggest that you use the B<-T> flag to use taint checking (see L<perlsec>)
even if we aren't running setuid or setgid.  This is always a good idea
for servers and other programs run on behalf of someone else (like CGI
scripts), because it lessens the chances that people from the outside will
be able to compromise your system.

Let's look at another TCP client.  This one connects to the TCP "time"
service on a number of different machines and shows how far their clocks
differ from the system on which it's being run:

    #!/usr/bin/perl  -w
    use strict;
    use Socket;

    my $SECS_of_70_YEARS = 2208988800;
    sub ctime { scalar localtime(shift) }

    my $iaddr = gethostbyname('localhost');
    my $proto = getprotobyname('tcp');
    my $port = getservbyname('time', 'tcp');
    my $paddr = sockaddr_in(0, $iaddr);
    my($host);

    $| = 1;
    printf "%-24s %8s %s\n",  "localhost", 0, ctime(time());

    foreach $host (@ARGV) {
	printf "%-24s ", $host;
	my $hisiaddr = inet_aton($host)     || die "unknown host";
	my $hispaddr = sockaddr_in($port, $hisiaddr);
	socket(SOCKET, PF_INET, SOCK_STREAM, $proto)   || die "socket: $!";
	connect(SOCKET, $hispaddr)          || die "bind: $!";
	my $rtime = '    ';
	read(SOCKET, $rtime, 4);
	close(SOCKET);
	my $histime = unpack("N", $rtime) - $SECS_of_70_YEARS ;
	printf "%8d %s\n", $histime - time, ctime($histime);
    }

=head2 Unix-Domain TCP Clients and Servers

That's fine for Internet-domain clients and servers, but what about local
communications?  While you can use the same setup, sometimes you don't
want to.  Unix-domain sockets are local to the current host, and are often
used internally to implement pipes.  Unlike Internet domain sockets, Unix
domain sockets can show up in the file system with an ls(1) listing.

    % ls -l /dev/log
    srw-rw-rw-  1 root            0 Oct 31 07:23 /dev/log

You can test for these with Perl's B<-S> file test:

    unless ( -S '/dev/log' ) {
	die "something's wicked with the log system";
    }

Here's a sample Unix-domain client:

    #!/usr/bin/perl -w
    use Socket;
    use strict;
    my ($rendezvous, $line);

    $rendezvous = shift || '/tmp/catsock';
    socket(SOCK, PF_UNIX, SOCK_STREAM, 0)	|| die "socket: $!";
    connect(SOCK, sockaddr_un($rendezvous))	|| die "connect: $!";
    while (defined($line = <SOCK>)) {
	print $line;
    }
    exit;

And here's a corresponding server.  You don't have to worry about silly
network terminators here because Unix domain sockets are guaranteed
to be on the localhost, and thus everything works right.

    #!/usr/bin/perl -Tw
    use strict;
    use Socket;
    use Carp;

    BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
    sub spawn;  # forward declaration
    sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }

    my $NAME = '/tmp/catsock';
    my $uaddr = sockaddr_un($NAME);
    my $proto = getprotobyname('tcp');

    socket(Server,PF_UNIX,SOCK_STREAM,0) 	|| die "socket: $!";
    unlink($NAME);
    bind  (Server, $uaddr) 			|| die "bind: $!";
    listen(Server,SOMAXCONN)			|| die "listen: $!";

    logmsg "server started on $NAME";

    my $waitedpid;

    sub REAPER {
	$waitedpid = wait;
	$SIG{CHLD} = \&REAPER;  # loathe sysV
	logmsg "reaped $waitedpid" . ($? ? " with exit $?" : '');
    }

    $SIG{CHLD} = \&REAPER;


    for ( $waitedpid = 0;
	  accept(Client,Server) || $waitedpid;
	  $waitedpid = 0, close Client)
    {
	next if $waitedpid;
	logmsg "connection on $NAME";
	spawn sub {
	    print "Hello there, it's now ", scalar localtime, "\n";
	    exec '/usr/games/fortune' or die "can't exec fortune: $!";
	};
    }

    sub spawn {
	my $coderef = shift;

	unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') {
	    confess "usage: spawn CODEREF";
	}

	my $pid;
	if (!defined($pid = fork)) {
	    logmsg "cannot fork: $!";
	    return;
	} elsif ($pid) {
	    logmsg "begat $pid";
	    return; # I'm the parent
	}
	# else I'm the child -- go spawn

	open(STDIN,  "<&Client")   || die "can't dup client to stdin";
	open(STDOUT, ">&Client")   || die "can't dup client to stdout";
	## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr";
	exit &$coderef();
    }

As you see, it's remarkably similar to the Internet domain TCP server, so
much so, in fact, that we've omitted several duplicate functions--spawn(),
logmsg(), ctime(), and REAPER()--which are exactly the same as in the
other server.

So why would you ever want to use a Unix domain socket instead of a
simpler named pipe?  Because a named pipe doesn't give you sessions.  You
can't tell one process's data from another's.  With socket programming,
you get a separate session for each client: that's why accept() takes two
arguments.

For example, let's say that you have a long running database server daemon
that you want folks from the World Wide Web to be able to access, but only
if they go through a CGI interface.  You'd have a small, simple CGI
program that does whatever checks and logging you feel like, and then acts
as a Unix-domain client and connects to your private server.

=head1 TCP Clients with IO::Socket

For those preferring a higher-level interface to socket programming, the
IO::Socket module provides an object-oriented approach.  IO::Socket is
included as part of the standard Perl distribution as of the 5.004
release.  If you're running an earlier version of Perl, just fetch
IO::Socket from CPAN, where you'll also find modules providing easy
interfaces to the following systems: DNS, FTP, Ident (RFC 931), NIS and
NISPlus, NNTP, Ping, POP3, SMTP, SNMP, SSLeay, Telnet, and Time--just
to name a few.

=head2 A Simple Client

Here's a client that creates a TCP connection to the "daytime"
service at port 13 of the host name "localhost" and prints out everything
that the server there cares to provide.

    #!/usr/bin/perl -w
    use IO::Socket;
    $remote = IO::Socket::INET->new(
			Proto    => "tcp",
			PeerAddr => "localhost",
			PeerPort => "daytime(13)",
		    )
		  or die "cannot connect to daytime port at localhost";
    while ( <$remote> ) { print }

When you run this program, you should get something back that
looks like this:

    Wed May 14 08:40:46 MDT 1997

Here are what those parameters to the C<new> constructor mean:

=over 4

=item C<Proto>

This is which protocol to use.  In this case, the socket handle returned
will be connected to a TCP socket, because we want a stream-oriented
connection, that is, one that acts pretty much like a plain old file.
Not all sockets are this of this type.  For example, the UDP protocol
can be used to make a datagram socket, used for message-passing.

=item C<PeerAddr>

This is the name or Internet address of the remote host the server is
running on.  We could have specified a longer name like C<"www.perl.com">,
or an address like C<"204.148.40.9">.  For demonstration purposes, we've
used the special hostname C<"localhost">, which should always mean the
current machine you're running on.  The corresponding Internet address
for localhost is C<"127.1">, if you'd rather use that.

=item C<PeerPort>

This is the service name or port number we'd like to connect to.
We could have gotten away with using just C<"daytime"> on systems with a
well-configured system services file,[FOOTNOTE: The system services file
is in I</etc/services> under Unix] but just in case, we've specified the
port number (13) in parentheses.  Using just the number would also have
worked, but constant numbers make careful programmers nervous.

=back

Notice how the return value from the C<new> constructor is used as
a filehandle in the C<while> loop?  That's what's called an indirect
filehandle, a scalar variable containing a filehandle.  You can use
it the same way you would a normal filehandle.  For example, you
can read one line from it this way:

    $line = <$handle>;

all remaining lines from is this way:

    @lines = <$handle>;

and send a line of data to it this way:

    print $handle "some data\n";

=head2 A Webget Client

Here's a simple client that takes a remote host to fetch a document
from, and then a list of documents to get from that host.  This is a
more interesting client than the previous one because it first sends
something to the server before fetching the server's response.

    #!/usr/bin/perl -w
    use IO::Socket;
    unless (@ARGV > 1) { die "usage: $0 host document ..." }
    $host = shift(@ARGV);
    $EOL = "\015\012";
    $BLANK = $EOL x 2;
    foreach $document ( @ARGV ) {
	$remote = IO::Socket::INET->new( Proto     => "tcp",
					 PeerAddr  => $host,
					 PeerPort  => "http(80)",
				        );
	unless ($remote) { die "cannot connect to http daemon on $host" }
	$remote->autoflush(1);
	print $remote "GET $document HTTP/1.0" . $BLANK;
	while ( <$remote> ) { print }
	close $remote;
    }

The web server handing the "http" service, which is assumed to be at
its standard port, number 80.  If the web server you're trying to
connect to is at a different port (like 1080 or 8080), you should specify
as the named-parameter pair, C<< PeerPort => 8080 >>.  The C<autoflush>
method is used on the socket because otherwise the system would buffer
up the output we sent it.  (If you're on a Mac, you'll also need to
change every C<"\n"> in your code that sends data over the network to
be a C<"\015\012"> instead.)

Connecting to the server is only the first part of the process: once you
have the connection, you have to use the server's language.  Each server
on the network has its own little command language that it expects as
input.  The string that we send to the server starting with "GET" is in
HTTP syntax.  In this case, we simply request each specified document.
Yes, we really are making a new connection for each document, even though
it's the same host.  That's the way you always used to have to speak HTTP.
Recent versions of web browsers may request that the remote server leave
the connection open a little while, but the server doesn't have to honor
such a request.

Here's an example of running that program, which we'll call I<webget>:

    % webget www.perl.com /guanaco.html
    HTTP/1.1 404 File Not Found
    Date: Thu, 08 May 1997 18:02:32 GMT
    Server: Apache/1.2b6
    Connection: close
    Content-type: text/html

    <HEAD><TITLE>404 File Not Found</TITLE></HEAD>
    <BODY><H1>File Not Found</H1>
    The requested URL /guanaco.html was not found on this server.<P>
    </BODY>

Ok, so that's not very interesting, because it didn't find that
particular document.  But a long response wouldn't have fit on this page.

For a more fully-featured version of this program, you should look to
the I<lwp-request> program included with the LWP modules from CPAN.

=head2 Interactive Client with IO::Socket

Well, that's all fine if you want to send one command and get one answer,
but what about setting up something fully interactive, somewhat like
the way I<telnet> works?  That way you can type a line, get the answer,
type a line, get the answer, etc.

This client is more complicated than the two we've done so far, but if
you're on a system that supports the powerful C<fork> call, the solution
isn't that rough.  Once you've made the connection to whatever service
you'd like to chat with, call C<fork> to clone your process.  Each of
these two identical process has a very simple job to do: the parent
copies everything from the socket to standard output, while the child
simultaneously copies everything from standard input to the socket.
To accomplish the same thing using just one process would be I<much>
harder, because it's easier to code two processes to do one thing than it
is to code one process to do two things.  (This keep-it-simple principle
a cornerstones of the Unix philosophy, and good software engineering as
well, which is probably why it's spread to other systems.)

Here's the code:

    #!/usr/bin/perl -w
    use strict;
    use IO::Socket;
    my ($host, $port, $kidpid, $handle, $line);

    unless (@ARGV == 2) { die "usage: $0 host port" }
    ($host, $port) = @ARGV;

    # create a tcp connection to the specified host and port
    $handle = IO::Socket::INET->new(Proto     => "tcp",
				    PeerAddr  => $host,
				    PeerPort  => $port)
	   or die "can't connect to port $port on $host: $!";

    $handle->autoflush(1);		# so output gets there right away
    print STDERR "[Connected to $host:$port]\n";

    # split the program into two processes, identical twins
    die "can't fork: $!" unless defined($kidpid = fork());

    # the if{} block runs only in the parent process
    if ($kidpid) {
	# copy the socket to standard output
	while (defined ($line = <$handle>)) {
	    print STDOUT $line;
	}
	kill("TERM", $kidpid);  		# send SIGTERM to child
    }
    # the else{} block runs only in the child process
    else {
	# copy standard input to the socket
	while (defined ($line = <STDIN>)) {
	    print $handle $line;
	}
    }

The C<kill> function in the parent's C<if> block is there to send a
signal to our child process (current running in the C<else> block)
as soon as the remote server has closed its end of the connection.

If the remote server sends data a byte at time, and you need that
data immediately without waiting for a newline (which might not happen),
you may wish to replace the C<while> loop in the parent with the
following:

    my $byte;
    while (sysread($handle, $byte, 1) == 1) {
	print STDOUT $byte;
    }

Making a system call for each byte you want to read is not very efficient
(to put it mildly) but is the simplest to explain and works reasonably
well.

=head1 TCP Servers with IO::Socket

As always, setting up a server is little bit more involved than running a client.
The model is that the server creates a special kind of socket that
does nothing but listen on a particular port for incoming connections.
It does this by calling the C<< IO::Socket::INET->new() >> method with
slightly different arguments than the client did.

=over 4

=item Proto

This is which protocol to use.  Like our clients, we'll
still specify C<"tcp"> here.

=item LocalPort

We specify a local
port in the C<LocalPort> argument, which we didn't do for the client.
This is service name or port number for which you want to be the
server. (Under Unix, ports under 1024 are restricted to the
superuser.)  In our sample, we'll use port 9000, but you can use
any port that's not currently in use on your system.  If you try
to use one already in used, you'll get an "Address already in use"
message.  Under Unix, the C<netstat -a> command will show
which services current have servers.

=item Listen

The C<Listen> parameter is set to the maximum number of
pending connections we can accept until we turn away incoming clients.
Think of it as a call-waiting queue for your telephone.
The low-level Socket module has a special symbol for the system maximum, which
is SOMAXCONN.

=item Reuse

The C<Reuse> parameter is needed so that we restart our server
manually without waiting a few minutes to allow system buffers to
clear out.

=back

Once the generic server socket has been created using the parameters
listed above, the server then waits for a new client to connect
to it.  The server blocks in the C<accept> method, which eventually an
bidirectional connection to the remote client.  (Make sure to autoflush
this handle to circumvent buffering.)

To add to user-friendliness, our server prompts the user for commands.
Most servers don't do this.  Because of the prompt without a newline,
you'll have to use the C<sysread> variant of the interactive client above.

This server accepts one of five different commands, sending output
back to the client.  Note that unlike most network servers, this one
only handles one incoming client at a time.  Multithreaded servers are
covered in Chapter 6 of the Camel.

Here's the code.  We'll

 #!/usr/bin/perl -w
 use IO::Socket;
 use Net::hostent;		# for OO version of gethostbyaddr

 $PORT = 9000;			# pick something not in use

 $server = IO::Socket::INET->new( Proto     => 'tcp',
                                  LocalPort => $PORT,
                                  Listen    => SOMAXCONN,
                                  Reuse     => 1);

 die "can't setup server" unless $server;
 print "[Server $0 accepting clients]\n";

 while ($client = $server->accept()) {
   $client->autoflush(1);
   print $client "Welcome to $0; type help for command list.\n";
   $hostinfo = gethostbyaddr($client->peeraddr);
   printf "[Connect from %s]\n", $hostinfo->name || $client->peerhost;
   print $client "Command? ";
   while ( <$client>) {
     next unless /\S/;	     # blank line
     if    (/quit|exit/i)    { last;                                     }
     elsif (/date|time/i)    { printf $client "%s\n", scalar localtime;  }
     elsif (/who/i )         { print  $client `who 2>&1`;                }
     elsif (/cookie/i )      { print  $client `/usr/games/fortune 2>&1`; }
     elsif (/motd/i )        { print  $client `cat /etc/motd 2>&1`;      }
     else {
       print $client "Commands: quit date who cookie motd\n";
     }
   } continue {
      print $client "Command? ";
   }
   close $client;
 }

=head1 UDP: Message Passing

Another kind of client-server setup is one that uses not connections, but
messages.  UDP communications involve much lower overhead but also provide
less reliability, as there are no promises that messages will arrive at
all, let alone in order and unmangled.  Still, UDP offers some advantages
over TCP, including being able to "broadcast" or "multicast" to a whole
bunch of destination hosts at once (usually on your local subnet).  If you
find yourself overly concerned about reliability and start building checks
into your message system, then you probably should use just TCP to start
with.

Note that UDP datagrams are I<not> a bytestream and should not be treated
as such. This makes using I/O mechanisms with internal buffering
like stdio (i.e. print() and friends) especially cumbersome. Use syswrite(),
or better send(), like in the example below.

Here's a UDP program similar to the sample Internet TCP client given
earlier.  However, instead of checking one host at a time, the UDP version
will check many of them asynchronously by simulating a multicast and then
using select() to do a timed-out wait for I/O.  To do something similar
with TCP, you'd have to use a different socket handle for each host.

    #!/usr/bin/perl -w
    use strict;
    use Socket;
    use Sys::Hostname;

    my ( $count, $hisiaddr, $hispaddr, $histime,
	 $host, $iaddr, $paddr, $port, $proto,
	 $rin, $rout, $rtime, $SECS_of_70_YEARS);

    $SECS_of_70_YEARS      = 2208988800;

    $iaddr = gethostbyname(hostname());
    $proto = getprotobyname('udp');
    $port = getservbyname('time', 'udp');
    $paddr = sockaddr_in(0, $iaddr); # 0 means let kernel pick

    socket(SOCKET, PF_INET, SOCK_DGRAM, $proto)   || die "socket: $!";
    bind(SOCKET, $paddr)                          || die "bind: $!";

    $| = 1;
    printf "%-12s %8s %s\n",  "localhost", 0, scalar localtime time;
    $count = 0;
    for $host (@ARGV) {
	$count++;
	$hisiaddr = inet_aton($host) 	|| die "unknown host";
	$hispaddr = sockaddr_in($port, $hisiaddr);
	defined(send(SOCKET, 0, 0, $hispaddr))    || die "send $host: $!";
    }

    $rin = '';
    vec($rin, fileno(SOCKET), 1) = 1;

    # timeout after 10.0 seconds
    while ($count && select($rout = $rin, undef, undef, 10.0)) {
	$rtime = '';
	($hispaddr = recv(SOCKET, $rtime, 4, 0)) 	|| die "recv: $!";
	($port, $hisiaddr) = sockaddr_in($hispaddr);
	$host = gethostbyaddr($hisiaddr, AF_INET);
	$histime = unpack("N", $rtime) - $SECS_of_70_YEARS ;
	printf "%-12s ", $host;
	printf "%8d %s\n", $histime - time, scalar localtime($histime);
	$count--;
    }

Note that this example does not include any retries and may consequently
fail to contact a reachable host. The most prominent reason for this
is congestion of the queues on the sending host if the number of
list of hosts to contact is sufficiently large.

=head1 SysV IPC

While System V IPC isn't so widely used as sockets, it still has some
interesting uses.  You can't, however, effectively use SysV IPC or
Berkeley mmap() to have shared memory so as to share a variable amongst
several processes.  That's because Perl would reallocate your string when
you weren't wanting it to.

Here's a small example showing shared memory usage.

    use IPC::SysV qw(IPC_PRIVATE IPC_RMID S_IRWXU);

    $size = 2000;
    $id = shmget(IPC_PRIVATE, $size, S_IRWXU) || die "$!";
    print "shm key $id\n";

    $message = "Message #1";
    shmwrite($id, $message, 0, 60) || die "$!";
    print "wrote: '$message'\n";
    shmread($id, $buff, 0, 60) || die "$!";
    print "read : '$buff'\n";

    # the buffer of shmread is zero-character end-padded.
    substr($buff, index($buff, "\0")) = '';
    print "un" unless $buff eq $message;
    print "swell\n";

    print "deleting shm $id\n";
    shmctl($id, IPC_RMID, 0) || die "$!";

Here's an example of a semaphore:

    use IPC::SysV qw(IPC_CREAT);

    $IPC_KEY = 1234;
    $id = semget($IPC_KEY, 10, 0666 | IPC_CREAT ) || die "$!";
    print "shm key $id\n";

Put this code in a separate file to be run in more than one process.
Call the file F<take>:

    # create a semaphore

    $IPC_KEY = 1234;
    $id = semget($IPC_KEY,  0 , 0 );
    die if !defined($id);

    $semnum = 0;
    $semflag = 0;

    # 'take' semaphore
    # wait for semaphore to be zero
    $semop = 0;
    $opstring1 = pack("s!s!s!", $semnum, $semop, $semflag);

    # Increment the semaphore count
    $semop = 1;
    $opstring2 = pack("s!s!s!", $semnum, $semop,  $semflag);
    $opstring = $opstring1 . $opstring2;

    semop($id,$opstring) || die "$!";

Put this code in a separate file to be run in more than one process.
Call this file F<give>:

    # 'give' the semaphore
    # run this in the original process and you will see
    # that the second process continues

    $IPC_KEY = 1234;
    $id = semget($IPC_KEY, 0, 0);
    die if !defined($id);

    $semnum = 0;
    $semflag = 0;

    # Decrement the semaphore count
    $semop = -1;
    $opstring = pack("s!s!s!", $semnum, $semop, $semflag);

    semop($id,$opstring) || die "$!";

The SysV IPC code above was written long ago, and it's definitely
clunky looking.  For a more modern look, see the IPC::SysV module
which is included with Perl starting from Perl 5.005.

A small example demonstrating SysV message queues:

    use IPC::SysV qw(IPC_PRIVATE IPC_RMID IPC_CREAT S_IRWXU);

    my $id = msgget(IPC_PRIVATE, IPC_CREAT | S_IRWXU);

    my $sent = "message";
    my $type = 1234;
    my $rcvd;
    my $type_rcvd;

    if (defined $id) {
        if (msgsnd($id, pack("l! a*", $type_sent, $sent), 0)) {
            if (msgrcv($id, $rcvd, 60, 0, 0)) {
                ($type_rcvd, $rcvd) = unpack("l! a*", $rcvd);
                if ($rcvd eq $sent) {
                    print "okay\n";
                } else {
                    print "not okay\n";
                }
            } else {
                die "# msgrcv failed\n";
            }
        } else {
            die "# msgsnd failed\n";
        }
        msgctl($id, IPC_RMID, 0) || die "# msgctl failed: $!\n";
    } else {
        die "# msgget failed\n";
    }

=head1 NOTES

Most of these routines quietly but politely return C<undef> when they
fail instead of causing your program to die right then and there due to
an uncaught exception.  (Actually, some of the new I<Socket> conversion
functions  croak() on bad arguments.)  It is therefore essential to
check return values from these functions.  Always begin your socket
programs this way for optimal success, and don't forget to add B<-T>
taint checking flag to the #! line for servers:

    #!/usr/bin/perl -Tw
    use strict;
    use sigtrap;
    use Socket;

=head1 BUGS

All these routines create system-specific portability problems.  As noted
elsewhere, Perl is at the mercy of your C libraries for much of its system
behaviour.  It's probably safest to assume broken SysV semantics for
signals and to stick with simple TCP and UDP socket operations; e.g., don't
try to pass open file descriptors over a local UDP datagram socket if you
want your code to stand a chance of being portable.

As mentioned in the signals section, because few vendors provide C
libraries that are safely re-entrant, the prudent programmer will do
little else within a handler beyond setting a numeric variable that
already exists; or, if locked into a slow (restarting) system call,
using die() to raise an exception and longjmp(3) out.  In fact, even
these may in some cases cause a core dump.  It's probably best to avoid
signals except where they are absolutely inevitable.  This 
will be addressed in a future release of Perl.

=head1 AUTHOR

Tom Christiansen, with occasional vestiges of Larry Wall's original
version and suggestions from the Perl Porters.

=head1 SEE ALSO

There's a lot more to networking than this, but this should get you
started.

For intrepid programmers, the indispensable textbook is I<Unix Network
Programming> by W. Richard Stevens (published by Addison-Wesley).  Note
that most books on networking address networking from the perspective of
a C programmer; translation to Perl is left as an exercise for the reader.

The IO::Socket(3) manpage describes the object library, and the Socket(3)
manpage describes the low-level interface to sockets.  Besides the obvious
functions in L<perlfunc>, you should also check out the F<modules> file
at your nearest CPAN site.  (See L<perlmodlib> or best yet, the F<Perl
FAQ> for a description of what CPAN is and where to get it.)

Section 5 of the F<modules> file is devoted to "Networking, Device Control
(modems), and Interprocess Communication", and contains numerous unbundled
modules numerous networking modules, Chat and Expect operations, CGI
programming, DCE, FTP, IPC, NNTP, Proxy, Ptty, RPC, SNMP, SMTP, Telnet,
Threads, and ToolTalk--just to name a few.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perllexwarn - Perl Lexical Warnings

=head1 DESCRIPTION

The C<use warnings> pragma is a replacement for both the command line
flag B<-w> and the equivalent Perl variable, C<$^W>.

The pragma works just like the existing "strict" pragma.
This means that the scope of the warning pragma is limited to the
enclosing block. It also means that the pragma setting will not
leak across files (via C<use>, C<require> or C<do>). This allows
authors to independently define the degree of warning checks that will
be applied to their module.

By default, optional warnings are disabled, so any legacy code that
doesn't attempt to control the warnings will work unchanged.

All warnings are enabled in a block by either of these:

    use warnings ;
    use warnings 'all' ;

Similarly all warnings are disabled in a block by either of these:

    no warnings ;
    no warnings 'all' ;

For example, consider the code below:

    use warnings ;
    my @a ;
    {
        no warnings ;
	my $b = @a[0] ;
    }
    my $c = @a[0];

The code in the enclosing block has warnings enabled, but the inner
block has them disabled. In this case that means the assignment to the
scalar C<$c> will trip the C<"Scalar value @a[0] better written as $a[0]">
warning, but the assignment to the scalar C<$b> will not.

=head2 Default Warnings and Optional Warnings

Before the introduction of lexical warnings, Perl had two classes of
warnings: mandatory and optional. 

As its name suggests, if your code tripped a mandatory warning, you
would get a warning whether you wanted it or not.
For example, the code below would always produce an C<"isn't numeric">
warning about the "2:".

    my $a = "2:" + 3;

With the introduction of lexical warnings, mandatory warnings now become
I<default> warnings. The difference is that although the previously
mandatory warnings are still enabled by default, they can then be
subsequently enabled or disabled with the lexical warning pragma. For
example, in the code below, an C<"isn't numeric"> warning will only
be reported for the C<$a> variable.

    my $a = "2:" + 3;
    no warnings ;
    my $b = "2:" + 3;

Note that neither the B<-w> flag or the C<$^W> can be used to
disable/enable default warnings. They are still mandatory in this case.

=head2 What's wrong with B<-w> and C<$^W>

Although very useful, the big problem with using B<-w> on the command
line to enable warnings is that it is all or nothing. Take the typical
scenario when you are writing a Perl program. Parts of the code you
will write yourself, but it's very likely that you will make use of
pre-written Perl modules. If you use the B<-w> flag in this case, you
end up enabling warnings in pieces of code that you haven't written.

Similarly, using C<$^W> to either disable or enable blocks of code is
fundamentally flawed. For a start, say you want to disable warnings in
a block of code. You might expect this to be enough to do the trick:

     {
         local ($^W) = 0 ;
	 my $a =+ 2 ;
	 my $b ; chop $b ;
     }

When this code is run with the B<-w> flag, a warning will be produced
for the C<$a> line -- C<"Reversed += operator">.

The problem is that Perl has both compile-time and run-time warnings. To
disable compile-time warnings you need to rewrite the code like this:

     {
         BEGIN { $^W = 0 }
	 my $a =+ 2 ;
	 my $b ; chop $b ;
     }

The other big problem with C<$^W> is the way you can inadvertently
change the warning setting in unexpected places in your code. For example,
when the code below is run (without the B<-w> flag), the second call
to C<doit> will trip a C<"Use of uninitialized value"> warning, whereas
the first will not.

    sub doit
    {
        my $b ; chop $b ;
    }

    doit() ;

    {
        local ($^W) = 1 ;
        doit()
    }

This is a side-effect of C<$^W> being dynamically scoped.

Lexical warnings get around these limitations by allowing finer control
over where warnings can or can't be tripped.

=head2 Controlling Warnings from the Command Line

There are three Command Line flags that can be used to control when
warnings are (or aren't) produced:

=over 5

=item B<-w>

This is  the existing flag. If the lexical warnings pragma is B<not>
used in any of you code, or any of the modules that you use, this flag
will enable warnings everywhere. See L<Backward Compatibility> for
details of how this flag interacts with lexical warnings.

=item B<-W>

If the B<-W> flag is used on the command line, it will enable all warnings
throughout the program regardless of whether warnings were disabled
locally using C<no warnings> or C<$^W =0>. This includes all files that get
included via C<use>, C<require> or C<do>.
Think of it as the Perl equivalent of the "lint" command.

=item B<-X>

Does the exact opposite to the B<-W> flag, i.e. it disables all warnings.

=back

=head2 Backward Compatibility

If you are used with working with a version of Perl prior to the
introduction of lexically scoped warnings, or have code that uses both
lexical warnings and C<$^W>, this section will describe how they interact.

How Lexical Warnings interact with B<-w>/C<$^W>:

=over 5

=item 1.

If none of the three command line flags (B<-w>, B<-W> or B<-X>) that
control warnings is used and neither C<$^W> or the C<warnings> pragma
are used, then default warnings will be enabled and optional warnings
disabled.
This means that legacy code that doesn't attempt to control the warnings
will work unchanged.

=item 2.

The B<-w> flag just sets the global C<$^W> variable as in 5.005 -- this
means that any legacy code that currently relies on manipulating C<$^W>
to control warning behavior will still work as is. 

=item 3.

Apart from now being a boolean, the C<$^W> variable operates in exactly
the same horrible uncontrolled global way, except that it cannot
disable/enable default warnings.

=item 4.

If a piece of code is under the control of the C<warnings> pragma,
both the C<$^W> variable and the B<-w> flag will be ignored for the
scope of the lexical warning.

=item 5.

The only way to override a lexical warnings setting is with the B<-W>
or B<-X> command line flags.

=back

The combined effect of 3 & 4 is that it will allow code which uses
the C<warnings> pragma to control the warning behavior of $^W-type
code (using a C<local $^W=0>) if it really wants to, but not vice-versa.

=head2 Category Hierarchy

A hierarchy of "categories" have been defined to allow groups of warnings
to be enabled/disabled in isolation.

The current hierarchy is:

  all -+
       |
       +- chmod
       |
       +- closure
       |
       +- exiting
       |
       +- glob
       |
       +- io -----------+
       |                |
       |                +- closed
       |                |
       |                +- exec
       |                |
       |                +- newline
       |                |
       |                +- pipe
       |                |
       |                +- unopened
       |
       +- misc
       |
       +- numeric
       |
       +- once
       |
       +- overflow
       |
       +- pack
       |
       +- portable
       |
       +- recursion
       |
       +- redefine
       |
       +- regexp
       |
       +- severe -------+
       |                |
       |                +- debugging
       |                |
       |                +- inplace
       |                |
       |                +- internal
       |                |
       |                +- malloc
       |
       +- signal
       |
       +- substr
       |
       +- syntax -------+
       |                |
       |                +- ambiguous
       |                |
       |                +- bareword
       |                |
       |                +- deprecated
       |                |
       |                +- digit
       |                |
       |                +- parenthesis
       |                |
       |                +- precedence
       |                |
       |                +- printf
       |                |
       |                +- prototype
       |                |
       |                +- qw
       |                |
       |                +- reserved
       |                |
       |                +- semicolon
       |
       +- taint
       |
       +- umask
       |
       +- uninitialized
       |
       +- unpack
       |
       +- untie
       |
       +- utf8
       |
       +- void
       |
       +- y2k

Just like the "strict" pragma any of these categories can be combined

    use warnings qw(void redefine) ;
    no warnings qw(io syntax untie) ;

Also like the "strict" pragma, if there is more than one instance of the
C<warnings> pragma in a given scope the cumulative effect is additive. 

    use warnings qw(void) ; # only "void" warnings enabled
    ...
    use warnings qw(io) ;   # only "void" & "io" warnings enabled
    ...
    no warnings qw(void) ;  # only "io" warnings enabled

To determine which category a specific warning has been assigned to see
L<perldiag>.

=head2 Fatal Warnings

The presence of the word "FATAL" in the category list will escalate any
warnings detected from the categories specified in the lexical scope
into fatal errors. In the code below, the use of C<time>, C<length>
and C<join> can all produce a C<"Useless use of xxx in void context">
warning.

    use warnings ;

    time ;

    {
        use warnings FATAL => qw(void) ;
        length "abc" ;
    }

    join "", 1,2,3 ;

    print "done\n" ;     

When run it produces this output

    Useless use of time in void context at fatal line 3.
    Useless use of length in void context at fatal line 7.  

The scope where C<length> is used has escalated the C<void> warnings
category into a fatal error, so the program terminates immediately it
encounters the warning.


=head2 Reporting Warnings from a Module

The C<warnings> pragma provides a number of functions that are useful for
module authors. These are used when you want to report a module-specific
warning to a calling module has enabled warnings via the C<warnings>
pragma.

Consider the module C<MyMod::Abc> below.

    package MyMod::Abc;

    use warnings::register;

    sub open {
        my $path = shift ;
        if (warnings::enabled() && $path !~ m#^/#) {
            warnings::warn("changing relative path to /tmp/");
            $path = "/tmp/$path" ; 
        }
    }

    1 ;

The call to C<warnings::register> will create a new warnings category
called "MyMod::abc", i.e. the new category name matches the current
package name. The C<open> function in the module will display a warning
message if it gets given a relative path as a parameter. This warnings
will only be displayed if the code that uses C<MyMod::Abc> has actually
enabled them with the C<warnings> pragma like below.

    use MyMod::Abc;
    use warnings 'MyMod::Abc';
    ...
    abc::open("../fred.txt");

It is also possible to test whether the pre-defined warnings categories are
set in the calling module with the C<warnings::enabled> function. Consider
this snippet of code:

    package MyMod::Abc;

    sub open {
        warnings::warnif("deprecated", 
                         "open is deprecated, use new instead") ;
        new(@_) ;
    }

    sub new
    ...
    1 ;

The function C<open> has been deprecated, so code has been included to
display a warning message whenever the calling module has (at least) the
"deprecated" warnings category enabled. Something like this, say.

    use warnings 'deprecated';
    use MyMod::Abc;
    ...
    MyMod::Abc::open($filename) ;

Either the C<warnings::warn> or C<warnings::warnif> function should be
used to actually display the warnings message. This is because they can
make use of the feature that allows warnings to be escalated into fatal
errors. So in this case

    use MyMod::Abc;
    use warnings FATAL => 'MyMod::Abc';
    ...
    MyMod::Abc::open('../fred.txt');

the C<warnings::warnif> function will detect this and die after
displaying the warning message.

The three warnings functions, C<warnings::warn>, C<warnings::warnif>
and C<warnings::enabled> can optionally take an object reference in place
of a category name. In this case the functions will use the class name
of the object as the warnings category.

Consider this example:

    package Original ;

    no warnings ;
    use warnings::register ;

    sub new
    {
        my $class = shift ;
        bless [], $class ;
    }

    sub check
    {
        my $self = shift ;
        my $value = shift ;

        if ($value % 2 && warnings::enabled($self))
          { warnings::warn($self, "Odd numbers are unsafe") }
    }

    sub doit
    {
        my $self = shift ;
        my $value = shift ;
        $self->check($value) ;
        # ...
    }

    1 ;

    package Derived ;

    use warnings::register ;
    use Original ;
    our @ISA = qw( Original ) ;
    sub new
    {
        my $class = shift ;
        bless [], $class ;
    }


    1 ;

The code below makes use of both modules, but it only enables warnings from 
C<Derived>.

    use Original ;
    use Derived ;
    use warnings 'Derived';
    my $a = new Original ;
    $a->doit(1) ;
    my $b = new Derived ;
    $a->doit(1) ;

When this code is run only the C<Derived> object, C<$b>, will generate
a warning. 

    Odd numbers are unsafe at main.pl line 7

Notice also that the warning is reported at the line where the object is first
used.

=head1 TODO

  perl5db.pl
    The debugger saves and restores C<$^W> at runtime. I haven't checked
    whether the debugger will still work with the lexical warnings
    patch applied.

  diagnostics.pm
    I *think* I've got diagnostics to work with the lexical warnings
    patch, but there were design decisions made in diagnostics to work
    around the limitations of C<$^W>. Now that those limitations are gone,
    the module should be revisited.

  document calling the warnings::* functions from XS

=head1 SEE ALSO

L<warnings>, L<perldiag>.

=head1 AUTHOR

Paul Marquess
      YOK       Cancel     7  K	File      7 KYFolder       	 .       O 	 Y         PName     8  J PValue    n      P  d PEdit     d  x PRun         Check for #! line      ] P qwScripts opened from Finder   $       P  d Enable inline input    &       P  d                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 =head1 NAME

perllocale - Perl locale handling (internationalization and localization)

=head1 DESCRIPTION

Perl supports language-specific notions of data such as "is this
a letter", "what is the uppercase equivalent of this letter", and
"which of these letters comes first".  These are important issues,
especially for languages other than English--but also for English: it
would be naE<iuml>ve to imagine that C<A-Za-z> defines all the "letters"
needed to write in English. Perl is also aware that some character other
than '.' may be preferred as a decimal point, and that output date
representations may be language-specific.  The process of making an
application take account of its users' preferences in such matters is
called B<internationalization> (often abbreviated as B<i18n>); telling
such an application about a particular set of preferences is known as
B<localization> (B<l10n>).

Perl can understand language-specific data via the standardized (ISO C,
XPG4, POSIX 1.c) method called "the locale system". The locale system is
controlled per application using one pragma, one function call, and
several environment variables.

B<NOTE>: This feature is new in Perl 5.004, and does not apply unless an
application specifically requests it--see L<Backward compatibility>.
The one exception is that write() now B<always> uses the current locale
- see L<"NOTES">.

=head1 PREPARING TO USE LOCALES

If Perl applications are to understand and present your data
correctly according a locale of your choice, B<all> of the following
must be true:

=over 4

=item *

B<Your operating system must support the locale system>.  If it does,
you should find that the setlocale() function is a documented part of
its C library.

=item *

B<Definitions for locales that you use must be installed>.  You, or
your system administrator, must make sure that this is the case. The
available locales, the location in which they are kept, and the manner
in which they are installed all vary from system to system.  Some systems
provide only a few, hard-wired locales and do not allow more to be
added.  Others allow you to add "canned" locales provided by the system
supplier.  Still others allow you or the system administrator to define
and add arbitrary locales.  (You may have to ask your supplier to
provide canned locales that are not delivered with your operating
system.)  Read your system documentation for further illumination.

=item *

B<Perl must believe that the locale system is supported>.  If it does,
C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
C<define>.

=back

If you want a Perl application to process and present your data
according to a particular locale, the application code should include
the S<C<use locale>> pragma (see L<The use locale pragma>) where
appropriate, and B<at least one> of the following must be true:

=over 4

=item *

B<The locale-determining environment variables (see L<"ENVIRONMENT">)
must be correctly set up> at the time the application is started, either
by yourself or by whoever set up your system account.

=item *

B<The application must set its own locale> using the method described in
L<The setlocale function>.

=back

=head1 USING LOCALES

=head2 The use locale pragma

By default, Perl ignores the current locale.  The S<C<use locale>>
pragma tells Perl to use the current locale for some operations:

=over 4

=item *

B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) and
the POSIX string collation functions strcoll() and strxfrm() use
C<LC_COLLATE>.  sort() is also affected if used without an
explicit comparison function, because it uses C<cmp> by default.

B<Note:> C<eq> and C<ne> are unaffected by locale: they always
perform a byte-by-byte comparison of their scalar operands.  What's
more, if C<cmp> finds that its operands are equal according to the
collation sequence specified by the current locale, it goes on to
perform a byte-by-byte comparison, and only returns I<0> (equal) if the
operands are bit-for-bit identical.  If you really want to know whether
two strings--which C<eq> and C<cmp> may consider different--are equal
as far as collation in the locale is concerned, see the discussion in
L<Category LC_COLLATE: Collation>.

=item *

B<Regular expressions and case-modification functions> (uc(), lc(),
ucfirst(), and lcfirst()) use C<LC_CTYPE>

=item *

B<The formatting functions> (printf(), sprintf() and write()) use
C<LC_NUMERIC>

=item *

B<The POSIX date formatting function> (strftime()) uses C<LC_TIME>.

=back

C<LC_COLLATE>, C<LC_CTYPE>, and so on, are discussed further in 
L<LOCALE CATEGORIES>.

The default behavior is restored with the S<C<no locale>> pragma, or
upon reaching the end of block enclosing C<use locale>.

The string result of any operation that uses locale
information is tainted, as it is possible for a locale to be
untrustworthy.  See L<"SECURITY">.

=head2 The setlocale function

You can switch locales as often as you wish at run time with the
POSIX::setlocale() function:

        # This functionality not usable prior to Perl 5.004
        require 5.004;

        # Import locale-handling tool set from POSIX module.
        # This example uses: setlocale -- the function call
        #                    LC_CTYPE -- explained below
        use POSIX qw(locale_h);

        # query and save the old locale
        $old_locale = setlocale(LC_CTYPE);

        setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
        # LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"

        setlocale(LC_CTYPE, "");
        # LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
        # environment variables.  See below for documentation.

        # restore the old locale
        setlocale(LC_CTYPE, $old_locale);

The first argument of setlocale() gives the B<category>, the second the
B<locale>.  The category tells in what aspect of data processing you
want to apply locale-specific rules.  Category names are discussed in
L<LOCALE CATEGORIES> and L<"ENVIRONMENT">.  The locale is the name of a
collection of customization information corresponding to a particular
combination of language, country or territory, and codeset.  Read on for
hints on the naming of locales: not all systems name locales as in the
example.

If no second argument is provided and the category is something else
than LC_ALL, the function returns a string naming the current locale
for the category.  You can use this value as the second argument in a
subsequent call to setlocale().

If no second argument is provided and the category is LC_ALL, the
result is implementation-dependent.  It may be a string of
concatenated locales names (separator also implementation-dependent)
or a single locale name.  Please consult your L<setlocale(3)> for
details.

If a second argument is given and it corresponds to a valid locale,
the locale for the category is set to that value, and the function
returns the now-current locale value.  You can then use this in yet
another call to setlocale().  (In some implementations, the return
value may sometimes differ from the value you gave as the second
argument--think of it as an alias for the value you gave.)

As the example shows, if the second argument is an empty string, the
category's locale is returned to the default specified by the
corresponding environment variables.  Generally, this results in a
return to the default that was in force when Perl started up: changes
to the environment made by the application after startup may or may not
be noticed, depending on your system's C library.

If the second argument does not correspond to a valid locale, the locale
for the category is not changed, and the function returns I<undef>.

For further information about the categories, consult L<setlocale(3)>.

=head2 Finding locales

For locales available in your system, consult also L<setlocale(3)> to
see whether it leads to the list of available locales (search for the
I<SEE ALSO> section).  If that fails, try the following command lines:

        locale -a

        nlsinfo

        ls /usr/lib/nls/loc

        ls /usr/lib/locale

        ls /usr/lib/nls

	ls /usr/share/locale

and see whether they list something resembling these

        en_US.ISO8859-1     de_DE.ISO8859-1     ru_RU.ISO8859-5
        en_US.iso88591      de_DE.iso88591      ru_RU.iso88595
        en_US               de_DE               ru_RU
        en                  de                  ru
        english             german              russian
        english.iso88591    german.iso88591     russian.iso88595
        english.roman8                          russian.koi8r

Sadly, even though the calling interface for setlocale() has been
standardized, names of locales and the directories where the
configuration resides have not been.  The basic form of the name is
I<language_territory>B<.>I<codeset>, but the latter parts after
I<language> are not always present.  The I<language> and I<country>
are usually from the standards B<ISO 3166> and B<ISO 639>, the
two-letter abbreviations for the countries and the languages of the
world, respectively.  The I<codeset> part often mentions some B<ISO
8859> character set, the Latin codesets.  For example, C<ISO 8859-1>
is the so-called "Western European codeset" that can be used to encode
most Western European languages adequately.  Again, there are several
ways to write even the name of that one standard.  Lamentably.

Two special locales are worth particular mention: "C" and "POSIX".
Currently these are effectively the same locale: the difference is
mainly that the first one is defined by the C standard, the second by
the POSIX standard.  They define the B<default locale> in which
every program starts in the absence of locale information in its
environment.  (The I<default> default locale, if you will.)  Its language
is (American) English and its character codeset ASCII.

B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
POSIX-conformant), so use "C" when you need explicitly to specify this
default locale.

=head2 LOCALE PROBLEMS

You may encounter the following warning message at Perl startup:

	perl: warning: Setting locale failed.
	perl: warning: Please check that your locale settings:
	        LC_ALL = "En_US",
	        LANG = (unset)
	    are supported and installed on your system.
	perl: warning: Falling back to the standard locale ("C").

This means that your locale settings had LC_ALL set to "En_US" and
LANG exists but has no value.  Perl tried to believe you but could not.
Instead, Perl gave up and fell back to the "C" locale, the default locale
that is supposed to work no matter what.  This usually means your locale
settings were wrong, they mention locales your system has never heard
of, or the locale installation in your system has problems (for example,
some system files are broken or missing).  There are quick and temporary
fixes to these problems, as well as more thorough and lasting fixes.

=head2 Temporarily fixing locale problems

The two quickest fixes are either to render Perl silent about any
locale inconsistencies or to run Perl under the default locale "C".

Perl's moaning about locale problems can be silenced by setting the
environment variable PERL_BADLANG to a zero value, for example "0".
This method really just sweeps the problem under the carpet: you tell
Perl to shut up even when Perl sees that something is wrong.  Do not
be surprised if later something locale-dependent misbehaves.

Perl can be run under the "C" locale by setting the environment
variable LC_ALL to "C".  This method is perhaps a bit more civilized
than the PERL_BADLANG approach, but setting LC_ALL (or
other locale variables) may affect other programs as well, not just
Perl.  In particular, external programs run from within Perl will see
these changes.  If you make the new settings permanent (read on), all
programs you run see the changes.  See L<ENVIRONMENT> for
the full list of relevant environment variables and L<USING LOCALES>
for their effects in Perl.  Effects in other programs are 
easily deducible.  For example, the variable LC_COLLATE may well affect
your B<sort> program (or whatever the program that arranges `records'
alphabetically in your system is called).

You can test out changing these variables temporarily, and if the
new settings seem to help, put those settings into your shell startup
files.  Consult your local documentation for the exact details.  For in
Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):

	LC_ALL=en_US.ISO8859-1
	export LC_ALL

This assumes that we saw the locale "en_US.ISO8859-1" using the commands
discussed above.  We decided to try that instead of the above faulty
locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)

	setenv LC_ALL en_US.ISO8859-1

If you do not know what shell you have, consult your local
helpdesk or the equivalent.

=head2 Permanently fixing locale problems

The slower but superior fixes are when you may be able to yourself
fix the misconfiguration of your own environment variables.  The
mis(sing)configuration of the whole system's locales usually requires
the help of your friendly system administrator.

First, see earlier in this document about L<Finding locales>.  That tells
how to find which locales are really supported--and more importantly,
installed--on your system.  In our example error message, environment
variables affecting the locale are listed in the order of decreasing
importance (and unset variables do not matter).  Therefore, having
LC_ALL set to "En_US" must have been the bad choice, as shown by the
error message.  First try fixing locale settings listed first.

Second, if using the listed commands you see something B<exactly>
(prefix matches do not count and case usually counts) like "En_US"
without the quotes, then you should be okay because you are using a
locale name that should be installed and available in your system.
In this case, see L<Permanently fixing your system's locale configuration>.

=head2 Permanently fixing your system's locale configuration

This is when you see something like:

	perl: warning: Please check that your locale settings:
	        LC_ALL = "En_US",
	        LANG = (unset)
	    are supported and installed on your system.

but then cannot see that "En_US" listed by the above-mentioned
commands.  You may see things like "en_US.ISO8859-1", but that isn't
the same.  In this case, try running under a locale
that you can list and which somehow matches what you tried.  The
rules for matching locale names are a bit vague because
standardization is weak in this area.  See again the 
L<Finding locales> about general rules.

=head2 Fixing system locale configuration

Contact a system administrator (preferably your own) and report the exact
error message you get, and ask them to read this same documentation you
are now reading.  They should be able to check whether there is something
wrong with the locale configuration of the system.  The L<Finding locales>
section is unfortunately a bit vague about the exact commands and places
because these things are not that standardized.

=head2 The localeconv function

The POSIX::localeconv() function allows you to get particulars of the
locale-dependent numeric formatting information specified by the current
C<LC_NUMERIC> and C<LC_MONETARY> locales.  (If you just want the name of
the current locale for a particular category, use POSIX::setlocale()
with a single parameter--see L<The setlocale function>.)

        use POSIX qw(locale_h);

        # Get a reference to a hash of locale-dependent info
        $locale_values = localeconv();

        # Output sorted list of the values
        for (sort keys %$locale_values) {
            printf "%-20s = %s\n", $_, $locale_values->{$_}
        }

localeconv() takes no arguments, and returns B<a reference to> a hash.
The keys of this hash are variable names for formatting, such as
C<decimal_point> and C<thousands_sep>.  The values are the
corresponding, er, values.  See L<POSIX/localeconv> for a longer
example listing the categories an implementation might be expected to
provide; some provide more and others fewer.  You don't need an
explicit C<use locale>, because localeconv() always observes the
current locale.

Here's a simple-minded example program that rewrites its command-line
parameters as integers correctly formatted in the current locale:

        # See comments in previous example
        require 5.004;
        use POSIX qw(locale_h);

        # Get some of locale's numeric formatting parameters
        my ($thousands_sep, $grouping) =
             @{localeconv()}{'thousands_sep', 'grouping'};

        # Apply defaults if values are missing
        $thousands_sep = ',' unless $thousands_sep;

	# grouping and mon_grouping are packed lists
	# of small integers (characters) telling the
	# grouping (thousand_seps and mon_thousand_seps
	# being the group dividers) of numbers and
	# monetary quantities.  The integers' meanings:
	# 255 means no more grouping, 0 means repeat
	# the previous grouping, 1-254 means use that
	# as the current grouping.  Grouping goes from
	# right to left (low to high digits).  In the
	# below we cheat slightly by never using anything
	# else than the first grouping (whatever that is).
	if ($grouping) {
	    @grouping = unpack("C*", $grouping);
	} else {
	    @grouping = (3);
	}

        # Format command line params for current locale
        for (@ARGV) {
            $_ = int;    # Chop non-integer part
            1 while
            s/(\d)(\d{$grouping[0]}($|$thousands_sep))/$1$thousands_sep$2/;
            print "$_";
        }
        print "\n";

=head1 LOCALE CATEGORIES

The following subsections describe basic locale categories.  Beyond these,
some combination categories allow manipulation of more than one
basic category at a time.  See L<"ENVIRONMENT"> for a discussion of these.

=head2 Category LC_COLLATE: Collation

In the scope of S<C<use locale>>, Perl looks to the C<LC_COLLATE>
environment variable to determine the application's notions on collation
(ordering) of characters.  For example, 'b' follows 'a' in Latin
alphabets, but where do 'E<aacute>' and 'E<aring>' belong?  And while
'color' follows 'chocolate' in English, what about in Spanish?

The following collations all make sense and you may meet any of them
if you "use locale".

	A B C D E a b c d e
	A a B b C c D d E e
	a A b B c C d D e E
	a b c d e A B C D E

Here is a code snippet to tell what "word"
characters are in the current locale, in that locale's order:

        use locale;
        print +(sort grep /\w/, map { chr } 0..255), "\n";

Compare this with the characters that you see and their order if you
state explicitly that the locale should be ignored:

        no locale;
        print +(sort grep /\w/, map { chr } 0..255), "\n";

This machine-native collation (which is what you get unless S<C<use
locale>> has appeared earlier in the same block) must be used for
sorting raw binary data, whereas the locale-dependent collation of the
first example is useful for natural text.

As noted in L<USING LOCALES>, C<cmp> compares according to the current
collation locale when C<use locale> is in effect, but falls back to a
byte-by-byte comparison for strings that the locale says are equal. You
can use POSIX::strcoll() if you don't want this fall-back:

        use POSIX qw(strcoll);
        $equal_in_locale =
            !strcoll("space and case ignored", "SpaceAndCaseIgnored");

$equal_in_locale will be true if the collation locale specifies a
dictionary-like ordering that ignores space characters completely and
which folds case.

If you have a single string that you want to check for "equality in
locale" against several others, you might think you could gain a little
efficiency by using POSIX::strxfrm() in conjunction with C<eq>:

        use POSIX qw(strxfrm);
        $xfrm_string = strxfrm("Mixed-case string");
        print "locale collation ignores spaces\n"
            if $xfrm_string eq strxfrm("Mixed-casestring");
        print "locale collation ignores hyphens\n"
            if $xfrm_string eq strxfrm("Mixedcase string");
        print "locale collation ignores case\n"
            if $xfrm_string eq strxfrm("mixed-case string");

strxfrm() takes a string and maps it into a transformed string for use
in byte-by-byte comparisons against other transformed strings during
collation.  "Under the hood", locale-affected Perl comparison operators
call strxfrm() for both operands, then do a byte-by-byte
comparison of the transformed strings.  By calling strxfrm() explicitly
and using a non locale-affected comparison, the example attempts to save
a couple of transformations.  But in fact, it doesn't save anything: Perl
magic (see L<perlguts/Magic Variables>) creates the transformed version of a
string the first time it's needed in a comparison, then keeps this version around
in case it's needed again.  An example rewritten the easy way with
C<cmp> runs just about as fast.  It also copes with null characters
embedded in strings; if you call strxfrm() directly, it treats the first
null it finds as a terminator.  don't expect the transformed strings
it produces to be portable across systems--or even from one revision
of your operating system to the next.  In short, don't call strxfrm()
directly: let Perl do it for you.

Note: C<use locale> isn't shown in some of these examples because it isn't
needed: strcoll() and strxfrm() exist only to generate locale-dependent
results, and so always obey the current C<LC_COLLATE> locale.

=head2 Category LC_CTYPE: Character Types

In the scope of S<C<use locale>>, Perl obeys the C<LC_CTYPE> locale
setting.  This controls the application's notion of which characters are
alphabetic.  This affects Perl's C<\w> regular expression metanotation,
which stands for alphanumeric characters--that is, alphabetic,
numeric, and including other special characters such as the underscore or
hyphen.  (Consult L<perlre> for more information about
regular expressions.)  Thanks to C<LC_CTYPE>, depending on your locale
setting, characters like 'E<aelig>', 'E<eth>', 'E<szlig>', and
'E<oslash>' may be understood as C<\w> characters.

The C<LC_CTYPE> locale also provides the map used in transliterating
characters between lower and uppercase.  This affects the case-mapping
functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping
interpolation with C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted strings
and C<s///> substitutions; and case-independent regular expression
pattern matching using the C<i> modifier.

Finally, C<LC_CTYPE> affects the POSIX character-class test
functions--isalpha(), islower(), and so on.  For example, if you move
from the "C" locale to a 7-bit Scandinavian one, you may find--possibly
to your surprise--that "|" moves from the ispunct() class to isalpha().

B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
in clearly ineligible characters being considered to be alphanumeric by
your application.  For strict matching of (mundane) letters and
digits--for example, in command strings--locale-aware applications
should use C<\w> inside a C<no locale> block.  See L<"SECURITY">.

=head2 Category LC_NUMERIC: Numeric Formatting

In the scope of S<C<use locale>>, Perl obeys the C<LC_NUMERIC> locale
information, which controls an application's idea of how numbers should
be formatted for human readability by the printf(), sprintf(), and
write() functions.  String-to-numeric conversion by the POSIX::strtod()
function is also affected.  In most implementations the only effect is to
change the character used for the decimal point--perhaps from '.'  to ','.
These functions aren't aware of such niceties as thousands separation and
so on.  (See L<The localeconv function> if you care about these things.)

Output produced by print() is also affected by the current locale: it
depends on whether C<use locale> or C<no locale> is in effect, and
corresponds to what you'd get from printf() in the "C" locale.  The
same is true for Perl's internal conversions between numeric and
string formats:

        use POSIX qw(strtod);
        use locale;

        $n = 5/2;   # Assign numeric 2.5 to $n

        $a = " $n"; # Locale-dependent conversion to string

        print "half five is $n\n";       # Locale-dependent output

        printf "half five is %g\n", $n;  # Locale-dependent output

        print "DECIMAL POINT IS COMMA\n"
            if $n == (strtod("2,5"))[0]; # Locale-dependent conversion

=head2 Category LC_MONETARY: Formatting of monetary amounts

The C standard defines the C<LC_MONETARY> category, but no function
that is affected by its contents.  (Those with experience of standards
committees will recognize that the working group decided to punt on the
issue.)  Consequently, Perl takes no notice of it.  If you really want
to use C<LC_MONETARY>, you can query its contents--see 
L<The localeconv function>--and use the information that it returns in your 
application's own formatting of currency amounts.  However, you may well 
find that the information, voluminous and complex though it may be, still 
does not quite meet your requirements: currency formatting is a hard nut 
to crack.

=head2 LC_TIME

Output produced by POSIX::strftime(), which builds a formatted
human-readable date/time string, is affected by the current C<LC_TIME>
locale.  Thus, in a French locale, the output produced by the C<%B>
format element (full month name) for the first month of the year would
be "janvier".  Here's how to get a list of long month names in the
current locale:

        use POSIX qw(strftime);
        for (0..11) {
            $long_month_name[$_] =
                strftime("%B", 0, 0, 0, 1, $_, 96);
        }

Note: C<use locale> isn't needed in this example: as a function that
exists only to generate locale-dependent results, strftime() always
obeys the current C<LC_TIME> locale.

=head2 Other categories

The remaining locale category, C<LC_MESSAGES> (possibly supplemented
by others in particular implementations) is not currently used by
Perl--except possibly to affect the behavior of library functions
called by extensions outside the standard Perl distribution and by the
operating system and its utilities.  Note especially that the string
value of C<$!> and the error messages given by external utilities may
be changed by C<LC_MESSAGES>.  If you want to have portable error
codes, use C<%!>.  See L<Errno>.

=head1 SECURITY

Although the main discussion of Perl security issues can be found in
L<perlsec>, a discussion of Perl's locale handling would be incomplete
if it did not draw your attention to locale-dependent security issues.
Locales--particularly on systems that allow unprivileged users to
build their own locales--are untrustworthy.  A malicious (or just plain
broken) locale can make a locale-aware application give unexpected
results.  Here are a few possibilities:

=over 4

=item *

Regular expression checks for safe file names or mail addresses using
C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
characters such as "E<gt>" and "|" are alphanumeric.

=item *

String interpolation with case-mapping, as in, say, C<$dest =
"C:\U$name.$ext">, may produce dangerous results if a bogus LC_CTYPE
case-mapping table is in effect.

=item *

A sneaky C<LC_COLLATE> locale could result in the names of students with
"D" grades appearing ahead of those with "A"s.

=item *

An application that takes the trouble to use information in
C<LC_MONETARY> may format debits as if they were credits and vice versa
if that locale has been subverted.  Or it might make payments in US
dollars instead of Hong Kong dollars.

=item *

The date and day names in dates formatted by strftime() could be
manipulated to advantage by a malicious user able to subvert the
C<LC_DATE> locale.  ("Look--it says I wasn't in the building on
Sunday.")

=back

Such dangers are not peculiar to the locale system: any aspect of an
application's environment which may be modified maliciously presents
similar challenges.  Similarly, they are not specific to Perl: any
programming language that allows you to write programs that take
account of their environment exposes you to these issues.

Perl cannot protect you from all possibilities shown in the
examples--there is no substitute for your own vigilance--but, when
C<use locale> is in effect, Perl uses the tainting mechanism (see
L<perlsec>) to mark string results that become locale-dependent, and
which may be untrustworthy in consequence.  Here is a summary of the
tainting behavior of operators and functions that may be affected by
the locale:

=over 4

=item  *

B<Comparison operators> (C<lt>, C<le>, C<ge>, C<gt> and C<cmp>):

Scalar true/false (or less/equal/greater) result is never tainted.

=item  *

B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u> or C<\U>)

Result string containing interpolated material is tainted if
C<use locale> is in effect.

=item  *

B<Matching operator> (C<m//>):

Scalar true/false result never tainted.

Subpatterns, either delivered as a list-context result or as $1 etc.
are tainted if C<use locale> is in effect, and the subpattern regular
expression contains C<\w> (to match an alphanumeric character), C<\W>
(non-alphanumeric character), C<\s> (white-space character), or C<\S>
(non white-space character).  The matched-pattern variable, $&, $`
(pre-match), $' (post-match), and $+ (last match) are also tainted if
C<use locale> is in effect and the regular expression contains C<\w>,
C<\W>, C<\s>, or C<\S>.

=item  *

B<Substitution operator> (C<s///>):

Has the same behavior as the match operator.  Also, the left
operand of C<=~> becomes tainted when C<use locale> in effect
if modified as a result of a substitution based on a regular
expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of
case-mapping with C<\l>, C<\L>,C<\u> or C<\U>.

=item  *

B<Output formatting functions> (printf() and write()):

Results are never tainted because otherwise even output from print,
for example C<print(1/7)>, should be tainted if C<use locale> is in
effect.

=item  *

B<Case-mapping functions> (lc(), lcfirst(), uc(), ucfirst()):

Results are tainted if C<use locale> is in effect.

=item  *

B<POSIX locale-dependent functions> (localeconv(), strcoll(),
strftime(), strxfrm()):

Results are never tainted.

=item  *

B<POSIX character class tests> (isalnum(), isalpha(), isdigit(),
isgraph(), islower(), isprint(), ispunct(), isspace(), isupper(),
isxdigit()):

True/false results are never tainted.

=back

Three examples illustrate locale-dependent tainting.
The first program, which ignores its locale, won't run: a value taken
directly from the command line may not be used to name an output file
when taint checks are enabled.

        #/usr/local/bin/perl -T
        # Run with taint checking

        # Command line sanity check omitted...
        $tainted_output_file = shift;

        open(F, ">$tainted_output_file")
            or warn "Open of $untainted_output_file failed: $!\n";

The program can be made to run by "laundering" the tainted value through
a regular expression: the second example--which still ignores locale
information--runs, creating the file named on its command line
if it can.

        #/usr/local/bin/perl -T

        $tainted_output_file = shift;
        $tainted_output_file =~ m%[\w/]+%;
        $untainted_output_file = $&;

        open(F, ">$untainted_output_file")
            or warn "Open of $untainted_output_file failed: $!\n";

Compare this with a similar but locale-aware program:

        #/usr/local/bin/perl -T

        $tainted_output_file = shift;
        use locale;
        $tainted_output_file =~ m%[\w/]+%;
        $localized_output_file = $&;

        open(F, ">$localized_output_file")
            or warn "Open of $localized_output_file failed: $!\n";

This third program fails to run because $& is tainted: it is the result
of a match involving C<\w> while C<use locale> is in effect.

=head1 ENVIRONMENT

=over 12

=item PERL_BADLANG

A string that can suppress Perl's warning about failed locale settings
at startup.  Failure can occur if the locale support in the operating
system is lacking (broken) in some way--or if you mistyped the name of
a locale when you set up your environment.  If this environment
variable is absent, or has a value that does not evaluate to integer
zero--that is, "0" or ""-- Perl will complain about locale setting
failures.

B<NOTE>: PERL_BADLANG only gives you a way to hide the warning message.
The message tells about some problem in your system's locale support,
and you should investigate what the problem is.

=back

The following environment variables are not specific to Perl: They are
part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale() method
for controlling an application's opinion on data.

=over 12

=item LC_ALL

C<LC_ALL> is the "override-all" locale environment variable. If
set, it overrides all the rest of the locale environment variables.

=item LANGUAGE

B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
are using the GNU libc.  This is the case if you are using e.g. Linux.
If you are using "commercial" UNIXes you are most probably I<not>
using GNU libc and you can ignore C<LANGUAGE>.

However, in the case you are using C<LANGUAGE>: it affects the
language of informational, warning, and error messages output by
commands (in other words, it's like C<LC_MESSAGES>) but it has higher
priority than L<LC_ALL>.  Moreover, it's not a single value but
instead a "path" (":"-separated list) of I<languages> (not locales).
See the GNU C<gettext> library documentation for more information.

=item LC_CTYPE

In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
locale.  In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
chooses the character type locale.

=item LC_COLLATE

In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
(sorting) locale.  In the absence of both C<LC_ALL> and C<LC_COLLATE>,
C<LANG> chooses the collation locale.

=item LC_MONETARY

In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
formatting locale.  In the absence of both C<LC_ALL> and C<LC_MONETARY>,
C<LANG> chooses the monetary formatting locale.

=item LC_NUMERIC

In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
locale.  In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
chooses the numeric format.

=item LC_TIME

In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
formatting locale.  In the absence of both C<LC_ALL> and C<LC_TIME>,
C<LANG> chooses the date and time formatting locale.

=item LANG

C<LANG> is the "catch-all" locale environment variable. If it is set, it
is used as the last resort after the overall C<LC_ALL> and the
category-specific C<LC_...>.

=back

=head1 NOTES

=head2 Backward compatibility

Versions of Perl prior to 5.004 B<mostly> ignored locale information,
generally behaving as if something similar to the C<"C"> locale were
always in force, even if the program environment suggested otherwise
(see L<The setlocale function>).  By default, Perl still behaves this
way for backward compatibility.  If you want a Perl application to pay
attention to locale information, you B<must> use the S<C<use locale>>
pragma (see L<The use locale pragma>) to instruct it to do so.

Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
information if available; that is, C<\w> did understand what
were the letters according to the locale environment variables.
The problem was that the user had no control over the feature:
if the C library supported locales, Perl used them.

=head2 I18N:Collate obsolete

In versions of Perl prior to 5.004, per-locale collation was possible
using the C<I18N::Collate> library module.  This module is now mildly
obsolete and should be avoided in new applications.  The C<LC_COLLATE>
functionality is now integrated into the Perl core language: One can
use locale-specific scalar data completely normally with C<use locale>,
so there is no longer any need to juggle with the scalar references of
C<I18N::Collate>.

=head2 Sort speed and memory use impacts

Comparing and sorting by locale is usually slower than the default
sorting; slow-downs of two to four times have been observed.  It will
also consume more memory: once a Perl scalar variable has participated
in any string comparison or sorting operation obeying the locale
collation rules, it will take 3-15 times more memory than before.  (The
exact multiplier depends on the string's contents, the operating system
and the locale.) These downsides are dictated more by the operating
system's implementation of the locale system than by Perl.

=head2 write() and LC_NUMERIC

Formats are the only part of Perl that unconditionally use information
from a program's locale; if a program's environment specifies an
LC_NUMERIC locale, it is always used to specify the decimal point
character in formatted output.  Formatted output cannot be controlled by
C<use locale> because the pragma is tied to the block structure of the
program, and, for historical reasons, formats exist outside that block
structure.

=head2 Freely available locale definitions

There is a large collection of locale definitions at
C<ftp://dkuug.dk/i18n/WG15-collection>.  You should be aware that it is
unsupported, and is not claimed to be fit for any purpose.  If your
system allows installation of arbitrary locales, you may find the
definitions useful as they are, or as a basis for the development of
your own locales.

=head2 I18n and l10n

"Internationalization" is often abbreviated as B<i18n> because its first
and last letters are separated by eighteen others.  (You may guess why
the internalin ... internaliti ... i18n tends to get abbreviated.)  In
the same way, "localization" is often abbreviated to B<l10n>.

=head2 An imperfect standard

Internationalization, as defined in the C and POSIX standards, can be
criticized as incomplete, ungainly, and having too large a granularity.
(Locales apply to a whole process, when it would arguably be more useful
to have them apply to a single thread, window group, or whatever.)  They
also have a tendency, like standards groups, to divide the world into
nations, when we all know that the world can equally well be divided
into bankers, bikers, gamers, and so on.  But, for now, it's the only
standard we've got.  This may be construed as a bug.

=head1 BUGS

=head2 Broken systems

In certain systems, the operating system's locale support
is broken and cannot be fixed or used by Perl.  Such deficiencies can
and will result in mysterious hangs and/or Perl core dumps when the
C<use locale> is in effect.  When confronted with such a system,
please report in excruciating detail to <F<perlbug@perl.org>>, and
complain to your vendor: bug fixes may exist for these problems
in your operating system.  Sometimes such bug fixes are called an
operating system upgrade.

=head1 SEE ALSO

L<POSIX/isalnum>, L<POSIX/isalpha>, L<POSIX/isdigit>, 
L<POSIX/isgraph>, L<POSIX/islower>, L<POSIX/isprint>, 
L<POSIX/ispunct>, L<POSIX/isspace>, L<POSIX/isupper>, 
L<POSIX/isxdigit>, L<POSIX/localeconv>, L<POSIX/setlocale>, 
L<POSIX/strcoll>, L<POSIX/strftime>, L<POSIX/strtod>, 
L<POSIX/strxfrm>.

=head1 HISTORY

Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
Dunlop, assisted by the perl5-porters.  Prose worked over a bit by
Tom Christiansen.

Last update: Thu Jun 11 08:44:13 MDT 1998
  &6 _+H^0O "L8NuvAEBuildParameters   /
QO$o J g 
fpPO$_Nu. / S/@ B// Ho N  TO $_NuAEPrint  H0QO&o $o  g 
f
pPOLNuB/| /Ho N  T6  < $0O LNuAEPrintSize  H $o &/ J o" o&* Jg o "R .ג RB LNubufput   "o J oS Jg QR  QBNubufputc  /(o  o  , "TJg`SJgJnB(`RSJgJn)@ (_Nubufputs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  =head1 NAME

perllol - Manipulating Arrays of Arrays in Perl

=head1 DESCRIPTION

=head2 Declaration and Access of Arrays of Arrays

The simplest thing to build an array of arrays (sometimes imprecisely
called a list of lists).  It's reasonably easy to understand, and
almost everything that applies here will also be applicable later
on with the fancier data structures.

An array of an array is just a regular old array @AoA that you can
get at with two subscripts, like C<$AoA[3][2]>.  Here's a declaration
of the array:

    # assign to our array, an array of array references
    @AoA = (
	   [ "fred", "barney" ],
	   [ "george", "jane", "elroy" ],
	   [ "homer", "marge", "bart" ],
    );

    print $AoA[2][2];
  bart

Now you should be very careful that the outer bracket type
is a round one, that is, a parenthesis.  That's because you're assigning to
an @array, so you need parentheses.  If you wanted there I<not> to be an @AoA,
but rather just a reference to it, you could do something more like this:

    # assign a reference to array of array references
    $ref_to_AoA = [
	[ "fred", "barney", "pebbles", "bambam", "dino", ],
	[ "homer", "bart", "marge", "maggie", ],
	[ "george", "jane", "elroy", "judy", ],
    ];

    print $ref_to_AoA->[2][2];

Notice that the outer bracket type has changed, and so our access syntax
has also changed.  That's because unlike C, in perl you can't freely
interchange arrays and references thereto.  $ref_to_AoA is a reference to an
array, whereas @AoA is an array proper.  Likewise, C<$AoA[2]> is not an
array, but an array ref.  So how come you can write these:

    $AoA[2][2]
    $ref_to_AoA->[2][2]

instead of having to write these:

    $AoA[2]->[2]
    $ref_to_AoA->[2]->[2]

Well, that's because the rule is that on adjacent brackets only (whether
square or curly), you are free to omit the pointer dereferencing arrow.
But you cannot do so for the very first one if it's a scalar containing
a reference, which means that $ref_to_AoA always needs it.

=head2 Growing Your Own

That's all well and good for declaration of a fixed data structure,
but what if you wanted to add new elements on the fly, or build
it up entirely from scratch?

First, let's look at reading it in from a file.  This is something like
adding a row at a time.  We'll assume that there's a flat file in which
each line is a row and each word an element.  If you're trying to develop an
@AoA array containing all these, here's the right way to do that:

    while (<>) {
	@tmp = split;
	push @AoA, [ @tmp ];
    }

You might also have loaded that from a function:

    for $i ( 1 .. 10 ) {
	$AoA[$i] = [ somefunc($i) ];
    }

Or you might have had a temporary variable sitting around with the
array in it.

    for $i ( 1 .. 10 ) {
	@tmp = somefunc($i);
	$AoA[$i] = [ @tmp ];
    }

It's very important that you make sure to use the C<[]> array reference
constructor.  That's because this will be very wrong:

    $AoA[$i] = @tmp;

You see, assigning a named array like that to a scalar just counts the
number of elements in @tmp, which probably isn't what you want.

If you are running under C<use strict>, you'll have to add some
declarations to make it happy:

    use strict;
    my(@AoA, @tmp);
    while (<>) {
	@tmp = split;
	push @AoA, [ @tmp ];
    }

Of course, you don't need the temporary array to have a name at all:

    while (<>) {
	push @AoA, [ split ];
    }

You also don't have to use push().  You could just make a direct assignment
if you knew where you wanted to put it:

    my (@AoA, $i, $line);
    for $i ( 0 .. 10 ) {
	$line = <>;
	$AoA[$i] = [ split ' ', $line ];
    }

or even just

    my (@AoA, $i);
    for $i ( 0 .. 10 ) {
	$AoA[$i] = [ split ' ', <> ];
    }

You should in general be leery of using functions that could
potentially return lists in scalar context without explicitly stating
such.  This would be clearer to the casual reader:

    my (@AoA, $i);
    for $i ( 0 .. 10 ) {
	$AoA[$i] = [ split ' ', scalar(<>) ];
    }

If you wanted to have a $ref_to_AoA variable as a reference to an array,
you'd have to do something like this:

    while (<>) {
	push @$ref_to_AoA, [ split ];
    }

Now you can add new rows.  What about adding new columns?  If you're
dealing with just matrices, it's often easiest to use simple assignment:

    for $x (1 .. 10) {
	for $y (1 .. 10) {
	    $AoA[$x][$y] = func($x, $y);
	}
    }

    for $x ( 3, 7, 9 ) {
	$AoA[$x][20] += func2($x);
    }

It doesn't matter whether those elements are already
there or not: it'll gladly create them for you, setting
intervening elements to C<undef> as need be.

If you wanted just to append to a row, you'd have
to do something a bit funnier looking:

    # add new columns to an existing row
    push @{ $AoA[0] }, "wilma", "betty";

Notice that I I<couldn't> say just:

    push $AoA[0], "wilma", "betty";  # WRONG!

In fact, that wouldn't even compile.  How come?  Because the argument
to push() must be a real array, not just a reference to such.

=head2 Access and Printing

Now it's time to print your data structure out.  How
are you going to do that?  Well, if you want only one
of the elements, it's trivial:

    print $AoA[0][0];

If you want to print the whole thing, though, you can't
say

    print @AoA;		# WRONG

because you'll get just references listed, and perl will never
automatically dereference things for you.  Instead, you have to
roll yourself a loop or two.  This prints the whole structure,
using the shell-style for() construct to loop across the outer
set of subscripts.

    for $aref ( @AoA ) {
	print "\t [ @$aref ],\n";
    }

If you wanted to keep track of subscripts, you might do this:

    for $i ( 0 .. $#AoA ) {
	print "\t elt $i is [ @{$AoA[$i]} ],\n";
    }

or maybe even this.  Notice the inner loop.

    for $i ( 0 .. $#AoA ) {
	for $j ( 0 .. $#{$AoA[$i]} ) {
	    print "elt $i $j is $AoA[$i][$j]\n";
	}
    }

As you can see, it's getting a bit complicated.  That's why
sometimes is easier to take a temporary on your way through:

    for $i ( 0 .. $#AoA ) {
	$aref = $AoA[$i];
	for $j ( 0 .. $#{$aref} ) {
	    print "elt $i $j is $AoA[$i][$j]\n";
	}
    }

Hmm... that's still a bit ugly.  How about this:

    for $i ( 0 .. $#AoA ) {
	$aref = $AoA[$i];
	$n = @$aref - 1;
	for $j ( 0 .. $n ) {
	    print "elt $i $j is $AoA[$i][$j]\n";
	}
    }

=head2 Slices

If you want to get at a slice (part of a row) in a multidimensional
array, you're going to have to do some fancy subscripting.  That's
because while we have a nice synonym for single elements via the
pointer arrow for dereferencing, no such convenience exists for slices.
(Remember, of course, that you can always write a loop to do a slice
operation.)

Here's how to do one operation using a loop.  We'll assume an @AoA
variable as before.

    @part = ();
    $x = 4;
    for ($y = 7; $y < 13; $y++) {
	push @part, $AoA[$x][$y];
    }

That same loop could be replaced with a slice operation:

    @part = @{ $AoA[4] } [ 7..12 ];

but as you might well imagine, this is pretty rough on the reader.

Ah, but what if you wanted a I<two-dimensional slice>, such as having
$x run from 4..8 and $y run from 7 to 12?  Hmm... here's the simple way:

    @newAoA = ();
    for ($startx = $x = 4; $x <= 8; $x++) {
	for ($starty = $y = 7; $y <= 12; $y++) {
	    $newAoA[$x - $startx][$y - $starty] = $AoA[$x][$y];
	}
    }

We can reduce some of the looping through slices

    for ($x = 4; $x <= 8; $x++) {
	push @newAoA, [ @{ $AoA[$x] } [ 7..12 ] ];
    }

If you were into Schwartzian Transforms, you would probably
have selected map for that

    @newAoA = map { [ @{ $AoA[$_] } [ 7..12 ] ] } 4 .. 8;

Although if your manager accused of seeking job security (or rapid
insecurity) through inscrutable code, it would be hard to argue. :-)
If I were you, I'd put that in a function:

    @newAoA = splice_2D( \@AoA, 4 => 8, 7 => 12 );
    sub splice_2D {
	my $lrr = shift; 	# ref to array of array refs!
	my ($x_lo, $x_hi,
	    $y_lo, $y_hi) = @_;

	return map {
	    [ @{ $lrr->[$_] } [ $y_lo .. $y_hi ] ]
	} $x_lo .. $x_hi;
    }


=head1 SEE ALSO

perldata(1), perlref(1), perldsc(1)

=head1 AUTHOR

Tom Christiansen <F<tchrist@perl.com>>

Last update: Thu Jun  4 16:16:23 MDT 1998
A/b? < @  <@  #  #                                       ??    ?  ?  ?                                                                                                                              	           	ɐɐ        		      ə           	     			         	   ə   		                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlmod - Perl modules (packages and symbol tables)

=head1 DESCRIPTION

=head2 Packages

Perl provides a mechanism for alternative namespaces to protect
packages from stomping on each other's variables.  In fact, there's
really no such thing as a global variable in Perl.  The package
statement declares the compilation unit as being in the given
namespace.  The scope of the package declaration is from the
declaration itself through the end of the enclosing block, C<eval>,
or file, whichever comes first (the same scope as the my() and
local() operators).  Unqualified dynamic identifiers will be in
this namespace, except for those few identifiers that if unqualified,
default to the main package instead of the current one as described
below.  A package statement affects only dynamic variables--including
those you've used local() on--but I<not> lexical variables created
with my().  Typically it would be the first declaration in a file
included by the C<do>, C<require>, or C<use> operators.  You can
switch into a package in more than one place; it merely influences
which symbol table is used by the compiler for the rest of that
block.  You can refer to variables and filehandles in other packages
by prefixing the identifier with the package name and a double
colon: C<$Package::Variable>.  If the package name is null, the
C<main> package is assumed.  That is, C<$::sail> is equivalent to
C<$main::sail>.

The old package delimiter was a single quote, but double colon is now the
preferred delimiter, in part because it's more readable to humans, and
in part because it's more readable to B<emacs> macros.  It also makes C++
programmers feel like they know what's going on--as opposed to using the
single quote as separator, which was there to make Ada programmers feel
like they knew what's going on.  Because the old-fashioned syntax is still
supported for backwards compatibility, if you try to use a string like
C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is,
the $s variable in package C<owner>, which is probably not what you meant.
Use braces to disambiguate, as in C<"This is ${owner}'s house">.

Packages may themselves contain package separators, as in
C<$OUTER::INNER::var>.  This implies nothing about the order of
name lookups, however.  There are no relative packages: all symbols
are either local to the current package, or must be fully qualified
from the outer package name down.  For instance, there is nowhere
within package C<OUTER> that C<$INNER::var> refers to
C<$OUTER::INNER::var>.  It would treat package C<INNER> as a totally
separate global package.

Only identifiers starting with letters (or underscore) are stored
in a package's symbol table.  All other symbols are kept in package
C<main>, including all punctuation variables, like $_.  In addition,
when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV,
ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>,
even when used for other purposes than their built-in one.  If you
have a package called C<m>, C<s>, or C<y>, then you can't use the
qualified form of an identifier because it would be instead interpreted
as a pattern match, a substitution, or a transliteration.

Variables beginning with underscore used to be forced into package
main, but we decided it was more useful for package writers to be able
to use leading underscore to indicate private variables and method names.
$_ is still global though.  See also
L<perlvar/"Technical Note on the Syntax of Variable Names">.

C<eval>ed strings are compiled in the package in which the eval() was
compiled.  (Assignments to C<$SIG{}>, however, assume the signal
handler specified is in the C<main> package.  Qualify the signal handler
name if you wish to have a signal handler in a package.)  For an
example, examine F<perldb.pl> in the Perl library.  It initially switches
to the C<DB> package so that the debugger doesn't interfere with variables
in the program you are trying to debug.  At various points, however, it
temporarily switches back to the C<main> package to evaluate various
expressions in the context of the C<main> package (or wherever you came
from).  See L<perldebug>.

The special symbol C<__PACKAGE__> contains the current package, but cannot
(easily) be used to construct variables.

See L<perlsub> for other scoping issues related to my() and local(),
and L<perlref> regarding closures.

=head2 Symbol Tables

The symbol table for a package happens to be stored in the hash of that
name with two colons appended.  The main symbol table's name is thus
C<%main::>, or C<%::> for short.  Likewise the symbol table for the nested
package mentioned earlier is named C<%OUTER::INNER::>.

The value in each entry of the hash is what you are referring to when you
use the C<*name> typeglob notation.  In fact, the following have the same
effect, though the first is more efficient because it does the symbol
table lookups at compile time:

    local *main::foo    = *main::bar;
    local $main::{foo}  = $main::{bar};

(Be sure to note the B<vast> difference between the second line above
and C<local $main::foo = $main::bar>. The former is accessing the hash
C<%main::>, which is the symbol table of package C<main>. The latter is
simply assigning scalar C<$bar> in package C<main> to scalar C<$foo> of
the same package.)

You can use this to print out all the variables in a package, for
instance.  The standard but antiquated F<dumpvar.pl> library and
the CPAN module Devel::Symdump make use of this.

Assignment to a typeglob performs an aliasing operation, i.e.,

    *dick = *richard;

causes variables, subroutines, formats, and file and directory handles
accessible via the identifier C<richard> also to be accessible via the
identifier C<dick>.  If you want to alias only a particular variable or
subroutine, assign a reference instead:

    *dick = \$richard;

Which makes $richard and $dick the same variable, but leaves
@richard and @dick as separate arrays.  Tricky, eh?

This mechanism may be used to pass and return cheap references
into or from subroutines if you don't want to copy the whole
thing.  It only works when assigning to dynamic variables, not
lexicals.

    %some_hash = ();			# can't be my()
    *some_hash = fn( \%another_hash );
    sub fn {
	local *hashsym = shift;
	# now use %hashsym normally, and you
	# will affect the caller's %another_hash
	my %nhash = (); # do what you want
	return \%nhash;
    }

On return, the reference will overwrite the hash slot in the
symbol table specified by the *some_hash typeglob.  This
is a somewhat tricky way of passing around references cheaply
when you don't want to have to remember to dereference variables
explicitly.

Another use of symbol tables is for making "constant" scalars.

    *PI = \3.14159265358979;

Now you cannot alter C<$PI>, which is probably a good thing all in all.
This isn't the same as a constant subroutine, which is subject to
optimization at compile-time.  A constant subroutine is one prototyped
to take no arguments and to return a constant expression.  See 
L<perlsub> for details on these.  The C<use constant> pragma is a
convenient shorthand for these.

You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and
package the *foo symbol table entry comes from.  This may be useful
in a subroutine that gets passed typeglobs as arguments:

    sub identify_typeglob {
        my $glob = shift;
        print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n";
    }
    identify_typeglob *foo;
    identify_typeglob *bar::baz;

This prints

    You gave me main::foo
    You gave me bar::baz

The C<*foo{THING}> notation can also be used to obtain references to the
individual elements of *foo.  See L<perlref>.

Subroutine definitions (and declarations, for that matter) need
not necessarily be situated in the package whose symbol table they
occupy.  You can define a subroutine outside its package by
explicitly qualifying the name of the subroutine:

    package main;
    sub Some_package::foo { ... }   # &foo defined in Some_package

This is just a shorthand for a typeglob assignment at compile time:

    BEGIN { *Some_package::foo = sub { ... } }

and is I<not> the same as writing:

    {
	package Some_package;
	sub foo { ... }
    }

In the first two versions, the body of the subroutine is
lexically in the main package, I<not> in Some_package. So
something like this:

    package main;

    $Some_package::name = "fred";
    $main::name = "barney";

    sub Some_package::foo {
	print "in ", __PACKAGE__, ": \$name is '$name'\n";
    }

    Some_package::foo();

prints:

    in main: $name is 'barney'

rather than:

    in Some_package: $name is 'fred'

This also has implications for the use of the SUPER:: qualifier
(see L<perlobj>).

=head2 Package Constructors and Destructors

Four special subroutines act as package constructors and destructors.
These are the C<BEGIN>, C<CHECK>, C<INIT>, and C<END> routines.  The
C<sub> is optional for these routines.

A C<BEGIN> subroutine is executed as soon as possible, that is, the moment
it is completely defined, even before the rest of the containing file
is parsed.  You may have multiple C<BEGIN> blocks within a file--they
will execute in order of definition.  Because a C<BEGIN> block executes
immediately, it can pull in definitions of subroutines and such from other
files in time to be visible to the rest of the file.  Once a C<BEGIN>
has run, it is immediately undefined and any code it used is returned to
Perl's memory pool.  This means you can't ever explicitly call a C<BEGIN>.

An C<END> subroutine is executed as late as possible, that is, after
perl has finished running the program and just before the interpreter
is being exited, even if it is exiting as a result of a die() function.
(But not if it's polymorphing into another program via C<exec>, or
being blown out of the water by a signal--you have to trap that yourself
(if you can).)  You may have multiple C<END> blocks within a file--they
will execute in reverse order of definition; that is: last in, first
out (LIFO).  C<END> blocks are not executed when you run perl with the
C<-c> switch, or if compilation fails.

Inside an C<END> subroutine, C<$?> contains the value that the program is
going to pass to C<exit()>.  You can modify C<$?> to change the exit
value of the program.  Beware of changing C<$?> by accident (e.g. by
running something via C<system>).

Similar to C<BEGIN> blocks, C<INIT> blocks are run just before the
Perl runtime begins execution, in "first in, first out" (FIFO) order.
For example, the code generators documented in L<perlcc> make use of
C<INIT> blocks to initialize and resolve pointers to XSUBs.

Similar to C<END> blocks, C<CHECK> blocks are run just after the
Perl compile phase ends and before the run time begins, in
LIFO order.  C<CHECK> blocks are again useful in the Perl compiler
suite to save the compiled state of the program.

When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and
C<END> work just as they do in B<awk>, as a degenerate case.
Both C<BEGIN> and C<CHECK> blocks are run when you use the B<-c>
switch for a compile-only syntax check, although your main code
is not.

=head2 Perl Classes

There is no special class syntax in Perl, but a package may act
as a class if it provides subroutines to act as methods.  Such a
package may also derive some of its methods from another class (package)
by listing the other package name(s) in its global @ISA array (which 
must be a package global, not a lexical).

For more on this, see L<perltoot> and L<perlobj>.

=head2 Perl Modules

A module is just a set of related functions in a library file, i.e.,
a Perl package with the same name as the file.  It is specifically 
designed to be reusable by other modules or programs.  It may do this
by providing a mechanism for exporting some of its symbols into the
symbol table of any package using it.  Or it may function as a class
definition and make its semantics available implicitly through
method calls on the class and its objects, without explicitly
exporting anything.  Or it can do a little of both.

For example, to start a traditional, non-OO module called Some::Module,
create a file called F<Some/Module.pm> and start with this template:

    package Some::Module;  # assumes Some/Module.pm

    use strict;
    use warnings;

    BEGIN {
        use Exporter   ();
        our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);

        # set the version for version checking
        $VERSION     = 1.00;
        # if using RCS/CVS, this may be preferred
        $VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; # must be all one line, for MakeMaker

        @ISA         = qw(Exporter);
        @EXPORT      = qw(&func1 &func2 &func4);
        %EXPORT_TAGS = ( );     # eg: TAG => [ qw!name1 name2! ],

        # your exported package globals go here,
        # as well as any optionally exported functions
        @EXPORT_OK   = qw($Var1 %Hashit &func3);
    }
    our @EXPORT_OK;

    # exported package globals go here
    our $Var1;
    our %Hashit;

    # non-exported package globals go here
    our @more;
    our $stuff;

    # initialize package globals, first exported ones
    $Var1   = '';
    %Hashit = ();

    # then the others (which are still accessible as $Some::Module::stuff)
    $stuff  = '';
    @more   = ();

    # all file-scoped lexicals must be created before
    # the functions below that use them.

    # file-private lexicals go here
    my $priv_var    = '';
    my %secret_hash = ();

    # here's a file-private function as a closure,
    # callable as &$priv_func;  it cannot be prototyped.
    my $priv_func = sub {
        # stuff goes here.
    };

    # make all your functions, whether exported or not;
    # remember to put something interesting in the {} stubs
    sub func1      {}    # no prototype
    sub func2()    {}    # proto'd void
    sub func3($$)  {}    # proto'd to 2 scalars

    # this one isn't exported, but could be called!
    sub func4(\%)  {}    # proto'd to 1 hash ref

    END { }       # module clean-up code here (global destructor)

    ## YOUR CODE GOES HERE

    1;  # don't forget to return a true value from the file

Then go on to declare and use your variables in functions without
any qualifications.  See L<Exporter> and the L<perlmodlib> for
details on mechanics and style issues in module creation.

Perl modules are included into your program by saying

    use Module;

or

    use Module LIST;

This is exactly equivalent to

    BEGIN { require Module; import Module; }

or

    BEGIN { require Module; import Module LIST; }

As a special case

    use Module ();

is exactly equivalent to

    BEGIN { require Module; }

All Perl module files have the extension F<.pm>.  The C<use> operator
assumes this so you don't have to spell out "F<Module.pm>" in quotes.
This also helps to differentiate new modules from old F<.pl> and
F<.ph> files.  Module names are also capitalized unless they're
functioning as pragmas; pragmas are in effect compiler directives,
and are sometimes called "pragmatic modules" (or even "pragmata"
if you're a classicist).

The two statements:

    require SomeModule;
    require "SomeModule.pm";		

differ from each other in two ways.  In the first case, any double
colons in the module name, such as C<Some::Module>, are translated
into your system's directory separator, usually "/".   The second
case does not, and would have to be specified literally.  The other
difference is that seeing the first C<require> clues in the compiler
that uses of indirect object notation involving "SomeModule", as
in C<$ob = purge SomeModule>, are method calls, not function calls.
(Yes, this really can make a difference.)

Because the C<use> statement implies a C<BEGIN> block, the importing
of semantics happens as soon as the C<use> statement is compiled,
before the rest of the file is compiled.  This is how it is able
to function as a pragma mechanism, and also how modules are able to
declare subroutines that are then visible as list or unary operators for
the rest of the current file.  This will not work if you use C<require>
instead of C<use>.  With C<require> you can get into this problem:

    require Cwd;		# make Cwd:: accessible
    $here = Cwd::getcwd();

    use Cwd;			# import names from Cwd::
    $here = getcwd();

    require Cwd;	    	# make Cwd:: accessible
    $here = getcwd(); 		# oops! no main::getcwd()

In general, C<use Module ()> is recommended over C<require Module>,
because it determines module availability at compile time, not in the
middle of your program's execution.  An exception would be if two modules
each tried to C<use> each other, and each also called a function from
that other module.  In that case, it's easy to use C<require>s instead.

Perl packages may be nested inside other package names, so we can have
package names containing C<::>.  But if we used that package name
directly as a filename it would make for unwieldy or impossible
filenames on some systems.  Therefore, if a module's name is, say,
C<Text::Soundex>, then its definition is actually found in the library
file F<Text/Soundex.pm>.

Perl modules always have a F<.pm> file, but there may also be
dynamically linked executables (often ending in F<.so>) or autoloaded
subroutine definitions (often ending in F<.al>) associated with the
module.  If so, these will be entirely transparent to the user of
the module.  It is the responsibility of the F<.pm> file to load
(or arrange to autoload) any additional functionality.  For example,
although the POSIX module happens to do both dynamic loading and
autoloading, the user can say just C<use POSIX> to get it all.

=head1 SEE ALSO

See L<perlmodlib> for general style issues related to building Perl
modules and classes, as well as descriptions of the standard library
and CPAN, L<Exporter> for how Perl's standard import/export mechanism
works, L<perltoot> and L<perltootc> for an in-depth tutorial on
creating classes, L<perlobj> for a hard-core reference document on
objects, L<perlsub> for an explanation of functions and scoping,
and L<perlxstut> and L<perlguts> for more information on writing
extension modules.
`JnJ o./* /* /*  j 
N%H  O g
 * Ѫ 2`p` &J "oF** "`<p cp ` ( Jo(/Hm/*  j 
N%H  O g٪ 2`p`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  =head1 NAME

perlmodinstall - Installing CPAN Modules

=head1 DESCRIPTION

You can think of a module as the fundamental unit of reusable Perl
code; see L<perlmod> for details.  Whenever anyone creates a chunk of
Perl code that they think will be useful to the world, they register
as a Perl developer at http://www.cpan.org/modules/04pause.html
so that they can then upload their code to the CPAN.  The CPAN is the
Comprehensive Perl Archive Network and can be accessed at
http://www.cpan.org/ , and searched at http://search.cpan.org/ .

This documentation is for people who want to download CPAN modules
and install them on their own computer.

=head2 PREAMBLE

First, are you sure that the module isn't already on your system?  Try
C<perl -MFoo -e 1>.  (Replace "Foo" with the name of the module; for
instance, C<perl -MCGI::Carp -e 1>.

If you don't see an error message, you have the module.  (If you do
see an error message, it's still possible you have the module, but
that it's not in your path, which you can display with C<perl -e
"print qq(@INC)">.)  For the remainder of this document, we'll assume
that you really honestly truly lack an installed module, but have
found it on the CPAN.

So now you have a file ending in .tar.gz (or, less often, .zip).  You
know there's a tasty module inside.  There are four steps you must now
take:

=over 5

=item B<DECOMPRESS> the file

=item B<UNPACK> the file into a directory

=item B<BUILD> the module (sometimes unnecessary)

=item B<INSTALL> the module.

=back

Here's how to perform each step for each operating system.  This is
<not> a substitute for reading the README and INSTALL files that
might have come with your module!

Also note that these instructions are tailored for installing the
module into your system's repository of Perl modules -- but you can
install modules into any directory you wish.  For instance, where I
say C<perl Makefile.PL>, you can substitute C<perl Makefile.PL
PREFIX=/my/perl_directory> to install the modules into
C</my/perl_directory>.  Then you can use the modules from your Perl
programs with C<use lib "/my/perl_directory/lib/site_perl";> or
sometimes just C<use "/my/perl_directory";>.  If you're on a system
that requires superuser/root access to install modules into the
directories you see when you type C<perl -e "print qq(@INC)">, you'll
want to install them into a local directory (such as your home
directory) and use this approach.

=over 4

=item *

B<If you're on a Unix or Linux system,>

You can use Andreas Koenig's CPAN module
( http://www.cpan.org/modules/by-module/CPAN )
to automate the following steps, from DECOMPRESS through INSTALL.

A. DECOMPRESS

Decompress the file with C<gzip -d yourmodule.tar.gz>

You can get gzip from ftp://prep.ai.mit.edu/pub/gnu.

Or, you can combine this step with the next to save disk space:

     gzip -dc yourmodule.tar.gz | tar -xof -

B. UNPACK

Unpack the result with C<tar -xof yourmodule.tar>

C. BUILD

Go into the newly-created directory and type:

      perl Makefile.PL
      make
      make test

or

      perl Makefile.PL PREFIX=/my/perl_directory

to install it locally.  (Remember that if you do this, you'll have to
put C<use lib "/my/perl_directory";> near the top of the program that
is to use this module.

D. INSTALL

While still in that directory, type:

      make install

Make sure you have the appropriate permissions to install the module
in your Perl 5 library directory.  Often, you'll need to be root.

That's all you need to do on Unix systems with dynamic linking.
Most Unix systems have dynamic linking -- if yours doesn't, or if for
another reason you have a statically-linked perl, B<and> the
module requires compilation, you'll need to build a new Perl binary
that includes the module.  Again, you'll probably need to be root.

=item *

B<If you're running ActivePerl (Win95/98/2K/NT/XP, Linux, Solaris)>

First, type C<ppm> from a shell and see whether ActiveState's PPM
repository has your module.  If so, you can install it with C<ppm> and
you won't have to bother with any of the other steps here.  You might
be able to use the CPAN instructions from the "Unix or Linux" section
above as well; give it a try.  Otherwise, you'll have to follow the
steps below.

   A. DECOMPRESS

You can use the shareware Winzip ( http://www.winzip.com ) to
decompress and unpack modules.

   B. UNPACK

If you used WinZip, this was already done for you.

   C. BUILD

Does the module require compilation (i.e. does it have files that end
in .xs, .c, .h, .y, .cc, .cxx, or .C)?  If it doesn't, go to INSTALL.
If it does, life is now officially tough for you, because you have to
compile the module yourself -- no easy feat on Windows.  You'll need
the C<nmake> utility, available at
ftp://ftp.microsoft.com/Softlib/MSLFILES/nmake15.exe.

   D. INSTALL

Copy the module into your Perl's I<lib> directory.  That'll be one
of the directories you see when you type

   perl -e 'print "@INC"'

=item *

B<If you're using a Macintosh,>


A. DECOMPRESS

First, make sure you have the latest B<cpan-mac> distribution (
http://www.cpan.org/authors/id/CNANDOR/ ), which has utilities for
doing all of the steps.  Read the cpan-mac directions carefully and
install it.  If you choose not to use cpan-mac for some reason, there
are alternatives listed here.

After installing cpan-mac, drop the module archive on the
B<untarzipme> droplet, which will decompress and unpack for you.

B<Or>, you can either use the shareware B<StuffIt Expander> program
( http://www.aladdinsys.com/expander/ )
in combination with B<DropStuff with Expander Enhancer>
( http://www.aladdinsys.com/dropstuff/ )
or the freeware B<MacGzip> program (
http://persephone.cps.unizar.es/general/gente/spd/gzip/gzip.html ).

B. UNPACK

If you're using untarzipme or StuffIt, the archive should be extracted
now.  B<Or>, you can use the freeware B<suntar> or I<Tar> (
http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cmp/ ).

C. BUILD

Check the contents of the distribution.
Read the module's documentation, looking for
reasons why you might have trouble using it with MacPerl.  Look for
F<.xs> and F<.c> files, which normally denote that the distribution
must be compiled, and you cannot install it "out of the box."
(See L<"PORTABILITY">.)

If a module does not work on MacPerl but should, or needs to be
compiled, see if the module exists already as a port on the
MacPerl Module Porters site (http://pudge.net/mmp/).
For more information on doing XS with MacPerl yourself, see
Arved Sandstrom's XS tutorial (http://macperl.com/depts/Tutorials/),
and then consider uploading your binary to the CPAN and
registering it on the MMP site.

D. INSTALL

If you are using cpan-mac, just drop the folder on the
B<installme> droplet, and use the module.

B<Or>, if you aren't using cpan-mac, do some manual labor.

Make sure the newlines for the modules are in Mac format, not Unix format.
If they are not then you might have decompressed them incorrectly.  Check
your decompression and unpacking utilities settings to make sure they are
translating text files properly.

As a last resort, you can use the perl one-liner:

    perl -i.bak -pe 's/(?:\015)?\012/\015/g' <filenames>

on the source files.

Then move the files (probably just the F<.pm> files, though there
may be some additional ones, too; check the module documentation)
to their final destination: This will
most likely be in C<$ENV{MACPERL}site_lib:> (i.e.,
C<HD:MacPerl folder:site_lib:>).  You can add new paths to
the default C<@INC> in the Preferences menu item in the
MacPerl application (C<$ENV{MACPERL}site_lib:> is added
automagically).  Create whatever directory structures are required
(i.e., for C<Some::Module>, create
C<$ENV{MACPERL}site_lib:Some:> and put
C<Module.pm> in that directory).

Then run the following script (or something like it):

     #!perl -w
     use AutoSplit;
     my $dir = "${MACPERL}site_perl";
     autosplit("$dir:Some:Module.pm", "$dir:auto", 0, 1, 1);

=item *

B<If you're on the DJGPP port of DOS,>

   A. DECOMPRESS

djtarx ( ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2/ )
will both uncompress and unpack.

   B. UNPACK

See above.

   C. BUILD

Go into the newly-created directory and type:

      perl Makefile.PL
      make
      make test

You will need the packages mentioned in F<README.dos>
in the Perl distribution.

   D. INSTALL

While still in that directory, type:

     make install	

You will need the packages mentioned in F<README.dos> in the Perl distribution.

=item *

B<If you're on OS/2,>

Get the EMX development suite and gzip/tar, from either Hobbes (
http://hobbes.nmsu.edu ) or Leo ( http://www.leo.org ), and then follow
the instructions for Unix.

=item *

B<If you're on VMS,>

When downloading from CPAN, save your file with a C<.tgz>
extension instead of C<.tar.gz>.  All other periods in the
filename should be replaced with underscores.  For example,
C<Your-Module-1.33.tar.gz> should be downloaded as
C<Your-Module-1_33.tgz>.

A. DECOMPRESS

Type

    gzip -d Your-Module.tgz

or, for zipped modules, type

    unzip Your-Module.zip

Executables for gzip, zip, and VMStar:

    http://www.openvms.digital.com/freeware/
    http://www.crinoid.com/utils/

and their source code:

    http://www.fsf.org/order/ftp.html

Note that GNU's gzip/gunzip is not the same as Info-ZIP's zip/unzip
package.  The former is a simple compression tool; the latter permits
creation of multi-file archives.

B. UNPACK

If you're using VMStar:

     VMStar xf Your-Module.tar

Or, if you're fond of VMS command syntax:

     tar/extract/verbose Your_Module.tar

C. BUILD

Make sure you have MMS (from Digital) or the freeware MMK ( available
from MadGoat at http://www.madgoat.com ).  Then type this to create
the DESCRIP.MMS for the module:

    perl Makefile.PL

Now you're ready to build:

    mms
    mms test

Substitute C<mmk> for C<mms> above if you're using MMK.

D. INSTALL

Type

    mms install

Substitute C<mmk> for C<mms> above if you're using MMK.

=item *

B<If you're on MVS>,

Introduce the F<.tar.gz> file into an HFS as binary; don't translate from
ASCII to EBCDIC.

A. DECOMPRESS

Decompress the file with C<gzip -d yourmodule.tar.gz>

You can get gzip from
http://www.s390.ibm.com/products/oe/bpxqp1.html

B. UNPACK

Unpack the result with

     pax -o to=IBM-1047,from=ISO8859-1 -r < yourmodule.tar

The BUILD and INSTALL steps are identical to those for Unix.  Some
modules generate Makefiles that work better with GNU make, which is
available from http://www.mks.com/s390/gnu/index.htm.

=back

=head1 PORTABILITY

Note that not all modules will work with on all platforms.
See L<perlport> for more information on portability issues.
Read the documentation to see if the module will work on your
system.  There are basically three categories
of modules that will not work "out of the box" with all
platforms (with some possibility of overlap):

=over 4

=item *

B<Those that should, but don't.>  These need to be fixed; consider
contacting the author and possibly writing a patch.

=item *

B<Those that need to be compiled, where the target platform
doesn't have compilers readily available.>  (These modules contain
F<.xs> or F<.c> files, usually.)  You might be able to find
existing binaries on the CPAN or elsewhere, or you might
want to try getting compilers and building it yourself, and then
release the binary for other poor souls to use.

=item *

B<Those that are targeted at a specific platform.>
(Such as the Win32:: modules.)  If the module is targeted
specifically at a platform other than yours, you're out
of luck, most likely.

=back



Check the CPAN Testers if a module should work with your platform
but it doesn't behave as you'd expect, or you aren't sure whether or
not a module will work under your platform.  If the module you want
isn't listed there, you can test it yourself and let CPAN Testers know,
you can join CPAN Testers, or you can request it be tested.

    http://testers.cpan.org/


=head1 HEY

If you have any suggested changes for this page, let me know.  Please
don't send me mail asking for help on how to install your modules.
There are too many modules, and too few Orwants, for me to be able to
answer or even acknowledge all your questions.  Contact the module
author instead, or post to comp.lang.perl.modules, or ask someone
familiar with Perl on your operating system.

=head1 AUTHOR

Jon Orwant

orwant@tpj.com

The Perl Journal, http://tpj.com

with invaluable help from Chris Nandor, and valuable help from Brandon
Allbery, Charles Bailey, Graham Barr, Dominic Dunlop, Jarkko
Hietaniemi, Ben Holzman, Tom Horsley, Nick Ing-Simmons, Tuomas
J. Lukka, Laszlo Molnar, Alan Olsen, Peter Prymmer, Gurusamy Sarathy,
Christoph Spalinger, Dan Sugalski, Larry Virden, and Ilya Zakharevich.

First version July 22, 1998; last revised November 21, 2001.

=head1 COPYRIGHT

Copyright (C) 1998, 2001 Jon Orwant.  All Rights Reserved.

Permission is granted to make and distribute verbatim copies of this
documentation provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of this
documentation under the conditions for verbatim copying, provided also
that they are marked clearly as modified versions, that the authors'
names and title are unchanged (though subtitles and additional
authors' names may be added), and that the entire resulting derived
work is distributed under the terms of a permission notice identical
to this one.

Permission is granted to copy and distribute translations of this
documentation into another language, under the above conditions for
modified versions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    #!../miniperl

open (OUT, ">perlmodlib.tmp") or die $!;
my (@pragma, @mod);
open (MANIFEST, "../MANIFEST") or die $!;

while (<MANIFEST>) {
     my $filename;
     next unless s|^lib/|| or m|^ext/|;
     ($filename) = /(\S+)/;
     $filename =~ s|^[^/]+/|| if $filename =~ s|^ext/||;
     next unless $filename =~ /\.p(m|od)$/;
     next unless open (MOD, "../lib/$filename");

     my ($name, $thing);
     my $foundit=0;
     {
	 local $/="";
	 while (<MOD>) {
	     next unless /^=head1 NAME/;
	     $foundit++;
	     last;
	 }
     }
     unless ($foundit) {
	 warn "$filename missing head1\n";
	 next;
     }
     my $title = <MOD>;
     chomp($title);
     close MOD;

     my $perlname = $filename;
     $perlname =~ s!\.p(m|od)$!!;
     $perlname =~ s!/!::!g;

     ($name, $thing) = split / --? /, $title, 2;

     unless ($name and $thing) {
	 warn "$filename missing name\n"  unless $name;
	 warn "$filename missing thing\n" unless $thing;
	 next;
     }

     $thing =~ s/^perl pragma to //i;
     $thing = ucfirst($thing);
     $title = "=item $perlname\n\n$thing\n\n";

     # print "$perlname $thing\n";

     if ($filename=~/[A-Z]/) {
          push @mod, $title;
     } else {
          push @pragma, $title;
     }
}

print OUT <<'EOF';
# Generated by perlmodlib.PL  DO NOT EDIT!

=head1 NAME

perlmodlib - constructing new Perl modules and finding existing ones

=head1 DESCRIPTION

=head1 THE PERL MODULE LIBRARY

Many modules are included the Perl distribution.  These are described
below, and all end in F<.pm>.  You may discover compiled library
file (usually ending in F<.so>) or small pieces of modules to be
autoloaded (ending in F<.al>); these were automatically generated
by the installation process.  You may also discover files in the
library directory that end in either F<.pl> or F<.ph>.  These are
old libraries supplied so that old programs that use them still
run.  The F<.pl> files will all eventually be converted into standard
modules, and the F<.ph> files made by B<h2ph> will probably end up
as extension modules made by B<h2xs>.  (Some F<.ph> values may
already be available through the POSIX, Errno, or Fcntl modules.)
The B<pl2pm> file in the distribution may help in your conversion,
but it's just a mechanical process and therefore far from bulletproof.

=head2 Pragmatic Modules

They work somewhat like compiler directives (pragmata) in that they
tend to affect the compilation of your program, and thus will usually
work well only when used within a C<use>, or C<no>.  Most of these
are lexically scoped, so an inner BLOCK may countermand them
by saying:

    no integer;
    no strict 'refs';
    no warnings;

which lasts until the end of that BLOCK.

Some pragmas are lexically scoped--typically those that affect the
C<$^H> hints variable.  Others affect the current package instead,
like C<use vars> and C<use subs>, which allow you to predeclare a
variables or subroutines within a particular I<file> rather than
just a block.  Such declarations are effective for the entire file
for which they were declared.  You cannot rescind them with C<no
vars> or C<no subs>.

The following pragmas are defined (and have their own documentation).

=over 12

EOF

print OUT $_ for (sort @pragma);

print OUT <<EOF;
=back

=head2 Standard Modules

Standard, bundled modules are all expected to behave in a well-defined
manner with respect to namespace pollution because they use the
Exporter module.  See their own documentation for details.

=over 12

EOF

print OUT $_ for (sort @mod);

print OUT <<'EOF';
=back

To find out I<all> modules installed on your system, including
those without documentation or outside the standard release,
just do this:

    % find `perl -e 'print "@INC"'` -name '*.pm' -print

They should all have their own documentation installed and accessible
via your system man(1) command.  If you do not have a B<find>
program, you can use the Perl B<find2perl> program instead, which
generates Perl code as output you can run through perl.  If you
have a B<man> program but it doesn't find your modules, you'll have
to fix your manpath.  See L<perl> for details.  If you have no
system B<man> command, you might try the B<perldoc> program.

=head2 Extension Modules

Extension modules are written in C (or a mix of Perl and C).  They
are usually dynamically loaded into Perl if and when you need them,
but may also be be linked in statically.  Supported extension modules
include Socket, Fcntl, and POSIX.

Many popular C extension modules do not come bundled (at least, not
completely) due to their sizes, volatility, or simply lack of time
for adequate testing and configuration across the multitude of
platforms on which Perl was beta-tested.  You are encouraged to
look for them on CPAN (described below), or using web search engines
like Alta Vista or Deja News.

=head1 CPAN

CPAN stands for Comprehensive Perl Archive Network; it's a globally
replicated trove of Perl materials, including documentation, style
guides, tricks and traps, alternate ports to non-Unix systems and
occasional binary distributions for these.   Search engines for
CPAN can be found at http://cpan.perl.com/ and at
http://theory.uwinnipeg.ca/mod_perl/cpan-search.pl .

Most importantly, CPAN includes around a thousand unbundled modules,
some of which require a C compiler to build.  Major categories of
modules are:

=over

=item *

Language Extensions and Documentation Tools

=item *

Development Support

=item *

Operating System Interfaces

=item *

Networking, Device Control (modems) and InterProcess Communication

=item *

Data Types and Data Type Utilities

=item *

Database Interfaces

=item *

User Interfaces

=item *

Interfaces to / Emulations of Other Programming Languages

=item *

File Names, File Systems and File Locking (see also File Handles)

=item *

String Processing, Language Text Processing, Parsing, and Searching

=item *

Option, Argument, Parameter, and Configuration File Processing

=item *

Internationalization and Locale

=item *

Authentication, Security, and Encryption

=item *

World Wide Web, HTML, HTTP, CGI, MIME

=item *

Server and Daemon Utilities

=item *

Archiving and Compression

=item *

Images, Pixmap and Bitmap Manipulation, Drawing, and Graphing

=item *

Mail and Usenet News

=item *

Control Flow Utilities (callbacks and exceptions etc)

=item *

File Handle and Input/Output Stream Utilities

=item *

Miscellaneous Modules

=back

Registered CPAN sites as of this writing include the following.
You should try to choose one close to you:

=head2 Africa

=over 4

=item *

South Africa

    ftp://ftp.is.co.za/programming/perl/CPAN/
    ftp://ftp.saix.net/pub/CPAN/
    ftp://ftpza.co.za/pub/mirrors/cpan/
    ftp://ftp.sun.ac.za/CPAN/

=back

=head2 Asia

=over 4

=item *

China

    ftp://freesoft.cei.gov.cn/pub/languages/perl/CPAN/
    http://www2.linuxforum.net/mirror/CPAN/
    http://cpan.shellhung.org/
    ftp://ftp.shellhung.org/pub/CPAN

=item *

Hong Kong

    http://CPAN.pacific.net.hk/
    ftp://ftp.pacific.net.hk/pub/mirror/CPAN/

=item *

Indonesia

    http://piksi.itb.ac.id/CPAN/
    ftp://mirrors.piksi.itb.ac.id/CPAN/
    http://CPAN.mweb.co.id/
    ftp://ftp.mweb.co.id/pub/languages/perl/CPAN/

=item *

Israel

    http://www.iglu.org.il:/pub/CPAN/
    ftp://ftp.iglu.org.il/pub/CPAN/
    http://bioinfo.weizmann.ac.il/pub/software/perl/CPAN/
    ftp://bioinfo.weizmann.ac.il/pub/software/perl/CPAN/

=item *

Japan

    ftp://ftp.u-aizu.ac.jp/pub/lang/perl/CPAN/
    ftp://ftp.kddlabs.co.jp/CPAN/
    http://mirror.nucba.ac.jp/mirror/Perl/
    ftp://mirror.nucba.ac.jp/mirror/Perl/
    ftp://ftp.meisei-u.ac.jp/pub/CPAN/
    ftp://ftp.jaist.ac.jp/pub/lang/perl/CPAN/
    ftp://ftp.dti.ad.jp/pub/lang/CPAN/
    ftp://ftp.ring.gr.jp/pub/lang/perl/CPAN/

=item *

Saudi Arabia

    ftp://ftp.isu.net.sa/pub/CPAN/

=item *

Singapore

    http://cpan.hjc.edu.sg
    http://ftp.nus.edu.sg/unix/perl/CPAN/
    ftp://ftp.nus.edu.sg/pub/unix/perl/CPAN/

=item *

South Korea

    http://CPAN.bora.net/
    ftp://ftp.bora.net/pub/CPAN/
    http://ftp.kornet.net/CPAN/
    ftp://ftp.kornet.net/pub/CPAN/
    ftp://ftp.nuri.net/pub/CPAN/

=item *

Taiwan

    ftp://coda.nctu.edu.tw/UNIX/perl/CPAN
    ftp://ftp.ee.ncku.edu.tw/pub/perl/CPAN/
    ftp://ftp1.sinica.edu.tw/pub1/perl/CPAN/

=item *

Thailand

    http://download.nectec.or.th/CPAN/
    ftp://ftp.nectec.or.th/pub/languages/CPAN/
    ftp://ftp.cs.riubon.ac.th/pub/mirrors/CPAN/

=back

=head2 Central America

=over 4

=item *

Costa Rica

    ftp://ftp.linux.co.cr/mirrors/CPAN/
    http://ftp.ucr.ac.cr/Unix/CPAN/
    ftp://ftp.ucr.ac.cr/pub/Unix/CPAN/

=back

=head2 Europe

=over 4

=item *

Austria

    ftp://ftp.tuwien.ac.at/pub/languages/perl/CPAN/

=item *

Belgium

    http://ftp.easynet.be/CPAN/
    ftp://ftp.easynet.be/CPAN/
    ftp://ftp.kulnet.kuleuven.ac.be/pub/mirror/CPAN/

=item *

Bulgaria

    ftp://ftp.ntrl.net/pub/mirrors/CPAN/

=item *

Croatia

    ftp://ftp.linux.hr/pub/CPAN/

=item *

Czech Republic

    http://www.fi.muni.cz/pub/perl/
    ftp://ftp.fi.muni.cz/pub/perl/
    ftp://sunsite.mff.cuni.cz/MIRRORS/ftp.funet.fi/pub/languages/perl/CPAN/

=item *

Denmark

    ftp://sunsite.auc.dk/pub/languages/perl/CPAN/
    http://www.cpan.dk/CPAN/
    ftp://www.cpan.dk/ftp.cpan.org/CPAN/

=item *

England

    http://www.mirror.ac.uk/sites/ftp.funet.fi/pub/languages/perl/CPAN
    ftp://ftp.mirror.ac.uk/sites/ftp.funet.fi/pub/languages/perl/CPAN/
    ftp://ftp.demon.co.uk/pub/mirrors/perl/CPAN/
    ftp://ftp.flirble.org/pub/languages/perl/CPAN/
    ftp://ftp.plig.org/pub/CPAN/
    ftp://sunsite.doc.ic.ac.uk/packages/CPAN/
    http://mirror.uklinux.net/CPAN/
    ftp://mirror.uklinux.net/pub/CPAN/
    ftp://usit.shef.ac.uk/pub/packages/CPAN/

=item *

Estonia

    ftp://ftp.ut.ee/pub/languages/perl/CPAN/

=item *

Finland

    ftp://ftp.funet.fi/pub/languages/perl/CPAN/

=item *

France

    ftp://cpan.ftp.worldonline.fr/pub/CPAN/
    ftp://ftp.club-internet.fr/pub/perl/CPAN/
    ftp://ftp.lip6.fr/pub/perl/CPAN/
    ftp://ftp.oleane.net/pub/mirrors/CPAN/
    ftp://ftp.pasteur.fr/pub/computing/CPAN/
    ftp://cpan.cict.fr/pub/CPAN/
    ftp://ftp.uvsq.fr/pub/perl/CPAN/

=item *

Germany

    ftp://ftp.rz.ruhr-uni-bochum.de/pub/CPAN/
    ftp://ftp.freenet.de/pub/ftp.cpan.org/pub/CPAN/
    ftp://ftp.uni-erlangen.de/pub/source/CPAN/
    ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/CPAN
    ftp://ftp.gigabell.net/pub/CPAN/
    http://ftp.gwdg.de/pub/languages/perl/CPAN/
    ftp://ftp.gwdg.de/pub/languages/perl/CPAN/
    ftp://ftp.uni-hamburg.de/pub/soft/lang/perl/CPAN/
    ftp://ftp.leo.org/pub/comp/general/programming/languages/script/perl/CPAN/
    ftp://ftp.mpi-sb.mpg.de/pub/perl/CPAN/
    ftp://ftp.gmd.de/mirrors/CPAN/

=item *

Greece

    ftp://ftp.forthnet.gr/pub/languages/perl/CPAN
    ftp://ftp.ntua.gr/pub/lang/perl/

=item *

Hungary

    http://cpan.artifact.hu/
    ftp://cpan.artifact.hu/CPAN/
    ftp://ftp.kfki.hu/pub/packages/perl/CPAN/

=item *

Iceland

    http://cpan.gm.is/
    ftp://ftp.gm.is/pub/CPAN/

=item *

Ireland

    http://cpan.indigo.ie/
    ftp://cpan.indigo.ie/pub/CPAN/
    http://sunsite.compapp.dcu.ie/pub/perl/
    ftp://sunsite.compapp.dcu.ie/pub/perl/

=item *

Italy

    http://cpan.nettuno.it/
    http://gusp.dyndns.org/CPAN/
    ftp://gusp.dyndns.org/pub/CPAN
    http://softcity.iol.it/cpan
    ftp://softcity.iol.it/pub/cpan
    ftp://ftp.unina.it/pub/Other/CPAN/
    ftp://ftp.unipi.it/pub/mirror/perl/CPAN/
    ftp://cis.uniRoma2.it/CPAN/
    ftp://ftp.edisontel.it/pub/CPAN_Mirror/
    ftp://ftp.flashnet.it/pub/CPAN/

=item *

Latvia

    http://kvin.lv/pub/CPAN/

=item *

Netherlands

    ftp://download.xs4all.nl/pub/mirror/CPAN/
    ftp://ftp.nl.uu.net/pub/CPAN/
    ftp://ftp.nluug.nl/pub/languages/perl/CPAN/
    ftp://ftp.cpan.nl/pub/CPAN/
    http://www.cs.uu.nl/mirror/CPAN/
    ftp://ftp.cs.uu.nl/mirror/CPAN/

=item *

Norway

    ftp://sunsite.uio.no/pub/languages/perl/CPAN/
    ftp://ftp.uit.no/pub/languages/perl/cpan/

=item *

Poland

    ftp://ftp.pk.edu.pl/pub/lang/perl/CPAN/
    ftp://ftp.mega.net.pl/pub/mirrors/ftp.perl.com/
    ftp://ftp.man.torun.pl/pub/doc/CPAN/
    ftp://sunsite.icm.edu.pl/pub/CPAN/

=item *

Portugal

    ftp://ftp.ua.pt/pub/CPAN/
    ftp://perl.di.uminho.pt/pub/CPAN/
    ftp://ftp.ist.utl.pt/pub/CPAN/
    ftp://ftp.netc.pt/pub/CPAN/

=item *

Romania

    ftp://archive.logicnet.ro/mirrors/ftp.cpan.org/CPAN/
    ftp://ftp.kappa.ro/pub/mirrors/ftp.perl.org/pub/CPAN/
    ftp://ftp.dntis.ro/pub/cpan/
    ftp://ftp.opsynet.com/cpan/
    ftp://ftp.dnttm.ro/pub/CPAN/
    ftp://ftp.timisoara.roedu.net/mirrors/CPAN/

=item *

Russia

    ftp://ftp.chg.ru/pub/lang/perl/CPAN/
    http://cpan.rinet.ru/
    ftp://cpan.rinet.ru/pub/mirror/CPAN/
    ftp://ftp.aha.ru/pub/CPAN/
    ftp://ftp.sai.msu.su/pub/lang/perl/CPAN/

=item *

Slovakia

    ftp://ftp.entry.sk/pub/languages/perl/CPAN/

=item *

Slovenia

    ftp://ftp.arnes.si/software/perl/CPAN/

=item *

Spain

    ftp://ftp.rediris.es/mirror/CPAN/
    ftp://ftp.etse.urv.es/pub/perl/

=item *

Sweden

    http://ftp.du.se/CPAN/
    ftp://ftp.du.se/pub/CPAN/
    ftp://ftp.sunet.se/pub/lang/perl/CPAN/

=item *

Switzerland

    ftp://ftp.danyk.ch/CPAN/
    ftp://sunsite.cnlab-switch.ch/mirror/CPAN/

=item *

Turkey

    ftp://sunsite.bilkent.edu.tr/pub/languages/CPAN/

=back

=head2 North America

=over 4

=item *

Canada

=over 8

=item *

Alberta

    http://sunsite.ualberta.ca/pub/Mirror/CPAN/
    ftp://sunsite.ualberta.ca/pub/Mirror/CPAN/

=item *

Manitoba

    http://theoryx5.uwinnipeg.ca/pub/CPAN/
    ftp://theoryx5.uwinnipeg.ca/pub/CPAN/

=item *

Nova Scotia

    ftp://cpan.chebucto.ns.ca/pub/CPAN/

=item *

Ontario

    ftp://ftp.crc.ca/pub/packages/lang/perl/CPAN/

=item *

Mexico

    http://www.msg.com.mx/CPAN/
    ftp://ftp.msg.com.mx/pub/CPAN/

=back

=item *

United States

=over 8

=item *

Alabama

    http://mirror.hiwaay.net/CPAN/
    ftp://mirror.hiwaay.net/CPAN/

=item *

California

    http://www.cpan.org/
    ftp://ftp.cpan.org/CPAN/
    ftp://cpan.nas.nasa.gov/pub/perl/CPAN/
    ftp://ftp.digital.com/pub/plan/perl/CPAN/
    http://www.kernel.org/pub/mirrors/cpan/
    ftp://ftp.kernel.org/pub/mirrors/cpan/
    http://www.perl.com/CPAN/
    http://download.sourceforge.net/mirrors/CPAN/

=item *

Colorado

    ftp://ftp.cs.colorado.edu/pub/perl/CPAN/

=item *

Florida

    ftp://ftp.cise.ufl.edu/pub/perl/CPAN/

=item *

Georgia

    ftp://ftp.twoguys.org/CPAN/

=item *

Illinois

    http://www.neurogames.com/mirrors/CPAN
    http://uiarchive.uiuc.edu/mirrors/ftp/ftp.cpan.org/pub/CPAN/
    ftp://uiarchive.uiuc.edu/mirrors/ftp/ftp.cpan.org/pub/CPAN/

=item *

Indiana

    ftp://ftp.uwsg.indiana.edu/pub/perl/CPAN/
    http://cpan.nitco.com/
    ftp://cpan.nitco.com/pub/CPAN/
    ftp://cpan.in-span.net/
    http://csociety-ftp.ecn.purdue.edu/pub/CPAN
    ftp://csociety-ftp.ecn.purdue.edu/pub/CPAN

=item *

Kentucky

    http://cpan.uky.edu/
    ftp://cpan.uky.edu/pub/CPAN/

=item *

Massachusetts

    ftp://ftp.ccs.neu.edu/net/mirrors/ftp.funet.fi/pub/languages/perl/CPAN/
    ftp://ftp.iguide.com/pub/mirrors/packages/perl/CPAN/

=item *

New Jersey

    ftp://ftp.cpanel.net/pub/CPAN/

=item *

New York

    ftp://ftp.freesoftware.com/pub/perl/CPAN/
    http://www.deao.net/mirrors/CPAN/
    ftp://ftp.deao.net/pub/CPAN/
    ftp://ftp.stealth.net/pub/mirrors/ftp.cpan.org/pub/CPAN/
    http://mirror.nyc.anidea.com/CPAN/
    ftp://mirror.nyc.anidea.com/pub/CPAN/
    http://www.rge.com/pub/languages/perl/
    ftp://ftp.rge.com/pub/languages/perl/
    ftp://mirrors.cloud9.net/pub/mirrors/CPAN/

=item *

North Carolina

    ftp://ftp.duke.edu/pub/perl/

=item *

Ohio

    ftp://ftp.loaded.net/pub/CPAN/

=item *

Oklahoma

    ftp://ftp.ou.edu/mirrors/CPAN/

=item *

Oregon

    ftp://ftp.orst.edu/pub/packages/CPAN/

=item *

Pennsylvania

    http://ftp.epix.net/CPAN/
    ftp://ftp.epix.net/pub/languages/perl/
    ftp://carroll.cac.psu.edu/pub/CPAN/

=item *

Tennessee

    ftp://ftp.sunsite.utk.edu/pub/CPAN/

=item *

Texas

    http://ftp.sedl.org/pub/mirrors/CPAN/
    http://jhcloos.com/pub/mirror/CPAN/
    ftp://jhcloos.com/pub/mirror/CPAN/

=item *

Utah

    ftp://mirror.xmission.com/CPAN/

=item *

Virginia

    http://mirrors.rcn.net/pub/lang/CPAN/
    ftp://mirrors.rcn.net/pub/lang/CPAN/
    ftp://ruff.cs.jmu.edu/pub/CPAN/
    http://perl.Liquidation.com/CPAN/

=item *

Washington

    http://cpan.llarian.net/
    ftp://cpan.llarian.net/pub/CPAN/
    ftp://ftp-mirror.internap.com/pub/CPAN/
    ftp://ftp.spu.edu/pub/CPAN/

=back

=back

=head2 Oceania

=over 4

=item *

Australia

    http://ftp.planetmirror.com/pub/CPAN/
    ftp://ftp.planetmirror.com/pub/CPAN/
    ftp://mirror.aarnet.edu.au/pub/perl/CPAN/
    ftp://cpan.topend.com.au/pub/CPAN/

=item *

New Zealand

    ftp://ftp.auckland.ac.nz/pub/perl/CPAN/

=back

=head2 South America

=over 4

=item *

Argentina

    ftp://mirrors.bannerlandia.com.ar/mirrors/CPAN/

=item *

Brazil

    ftp://cpan.pop-mg.com.br/pub/CPAN/
    ftp://ftp.matrix.com.br/pub/perl/
    ftp://cpan.if.usp.br/pub/mirror/CPAN/

=item *

Chile

    ftp://ftp.psinet.cl/pub/programming/perl/CPAN/
    ftp://sunsite.dcc.uchile.cl/pub/lang/perl/

=back

For an up-to-date listing of CPAN sites,
see http://www.cpan.org/SITES or ftp://www.cpan.org/SITES .

=head1 Modules: Creation, Use, and Abuse

(The following section is borrowed directly from Tim Bunce's modules
file, available at your nearest CPAN site.)

Perl implements a class using a package, but the presence of a
package doesn't imply the presence of a class.  A package is just a
namespace.  A class is a package that provides subroutines that can be
used as methods.  A method is just a subroutine that expects, as its
first argument, either the name of a package (for "static" methods),
or a reference to something (for "virtual" methods).

A module is a file that (by convention) provides a class of the same
name (sans the .pm), plus an import method in that class that can be
called to fetch exported symbols.  This module may implement some of
its methods by loading dynamic C or C++ objects, but that should be
totally transparent to the user of the module.  Likewise, the module
might set up an AUTOLOAD function to slurp in subroutine definitions on
demand, but this is also transparent.  Only the F<.pm> file is required to
exist.  See L<perlsub>, L<perltoot>, and L<AutoLoader> for details about
the AUTOLOAD mechanism.

=head2 Guidelines for Module Creation

=over 4

=item  *

Do similar modules already exist in some form?

If so, please try to reuse the existing modules either in whole or
by inheriting useful features into a new class.  If this is not
practical try to get together with the module authors to work on
extending or enhancing the functionality of the existing modules.
A perfect example is the plethora of packages in perl4 for dealing
with command line options.

If you are writing a module to expand an already existing set of
modules, please coordinate with the author of the package.  It
helps if you follow the same naming scheme and module interaction
scheme as the original author.

=item  *

Try to design the new module to be easy to extend and reuse.

Try to C<use warnings;> (or C<use warnings qw(...);>).
Remember that you can add C<no warnings qw(...);> to individual blocks
of code that need less warnings.

Use blessed references.  Use the two argument form of bless to bless
into the class name given as the first parameter of the constructor,
e.g.,:

 sub new {
     my $class = shift;
     return bless {}, $class;
 }

or even this if you'd like it to be used as either a static
or a virtual method.

 sub new {
     my $self  = shift;
     my $class = ref($self) || $self;
     return bless {}, $class;
 }

Pass arrays as references so more parameters can be added later
(it's also faster).  Convert functions into methods where
appropriate.  Split large methods into smaller more flexible ones.
Inherit methods from other modules if appropriate.

Avoid class name tests like: C<die "Invalid" unless ref $ref eq 'FOO'>.
Generally you can delete the C<eq 'FOO'> part with no harm at all.
Let the objects look after themselves! Generally, avoid hard-wired
class names as far as possible.

Avoid C<< $r->Class::func() >> where using C<@ISA=qw(... Class ...)> and
C<< $r->func() >> would work (see L<perlbot> for more details).

Use autosplit so little used or newly added functions won't be a
burden to programs that don't use them. Add test functions to
the module after __END__ either using AutoSplit or by saying:

 eval join('',<main::DATA>) || die $@ unless caller();

Does your module pass the 'empty subclass' test? If you say
C<@SUBCLASS::ISA = qw(YOURCLASS);> your applications should be able
to use SUBCLASS in exactly the same way as YOURCLASS.  For example,
does your application still work if you change:  C<$obj = new YOURCLASS;>
into: C<$obj = new SUBCLASS;> ?

Avoid keeping any state information in your packages. It makes it
difficult for multiple other packages to use yours. Keep state
information in objects.

Always use B<-w>.

Try to C<use strict;> (or C<use strict qw(...);>).
Remember that you can add C<no strict qw(...);> to individual blocks
of code that need less strictness.

Always use B<-w>.

Follow the guidelines in the perlstyle(1) manual.

Always use B<-w>.

=item  *

Some simple style guidelines

The perlstyle manual supplied with Perl has many helpful points.

Coding style is a matter of personal taste. Many people evolve their
style over several years as they learn what helps them write and
maintain good code.  Here's one set of assorted suggestions that
seem to be widely used by experienced developers:

Use underscores to separate words.  It is generally easier to read
$var_names_like_this than $VarNamesLikeThis, especially for
non-native speakers of English. It's also a simple rule that works
consistently with VAR_NAMES_LIKE_THIS.

Package/Module names are an exception to this rule. Perl informally
reserves lowercase module names for 'pragma' modules like integer
and strict. Other modules normally begin with a capital letter and
use mixed case with no underscores (need to be short and portable).

You may find it helpful to use letter case to indicate the scope
or nature of a variable. For example:

 $ALL_CAPS_HERE   constants only (beware clashes with Perl vars)
 $Some_Caps_Here  package-wide global/static
 $no_caps_here    function scope my() or local() variables

Function and method names seem to work best as all lowercase.
e.g., C<< $obj->as_string() >>.

You can use a leading underscore to indicate that a variable or
function should not be used outside the package that defined it.

=item  *

Select what to export.

Do NOT export method names!

Do NOT export anything else by default without a good reason!

Exports pollute the namespace of the module user.  If you must
export try to use @EXPORT_OK in preference to @EXPORT and avoid
short or common names to reduce the risk of name clashes.

Generally anything not exported is still accessible from outside the
module using the ModuleName::item_name (or C<< $blessed_ref->method >>)
syntax.  By convention you can use a leading underscore on names to
indicate informally that they are 'internal' and not for public use.

(It is actually possible to get private functions by saying:
C<my $subref = sub { ... };  &$subref;>.  But there's no way to call that
directly as a method, because a method must have a name in the symbol
table.)

As a general rule, if the module is trying to be object oriented
then export nothing. If it's just a collection of functions then
@EXPORT_OK anything but use @EXPORT with caution.

=item  *

Select a name for the module.

This name should be as descriptive, accurate, and complete as
possible.  Avoid any risk of ambiguity. Always try to use two or
more whole words.  Generally the name should reflect what is special
about what the module does rather than how it does it.  Please use
nested module names to group informally or categorize a module.
There should be a very good reason for a module not to have a nested name.
Module names should begin with a capital letter.

Having 57 modules all called Sort will not make life easy for anyone
(though having 23 called Sort::Quick is only marginally better :-).
Imagine someone trying to install your module alongside many others.
If in any doubt ask for suggestions in comp.lang.perl.misc.

If you are developing a suite of related modules/classes it's good
practice to use nested classes with a common prefix as this will
avoid namespace clashes. For example: Xyz::Control, Xyz::View,
Xyz::Model etc. Use the modules in this list as a naming guide.

If adding a new module to a set, follow the original author's
standards for naming modules and the interface to methods in
those modules.

If developing modules for private internal or project specific use,
that will never be released to the public, then you should ensure
that their names will not clash with any future public module. You
can do this either by using the reserved Local::* category or by
using a category name that includes an underscore like Foo_Corp::*.

To be portable each component of a module name should be limited to
11 characters. If it might be used on MS-DOS then try to ensure each is
unique in the first 8 characters. Nested modules make this easier.

=item  *

Have you got it right?

How do you know that you've made the right decisions? Have you
picked an interface design that will cause problems later? Have
you picked the most appropriate name? Do you have any questions?

The best way to know for sure, and pick up many helpful suggestions,
is to ask someone who knows. Comp.lang.perl.misc is read by just about
all the people who develop modules and it's the best place to ask.

All you need to do is post a short summary of the module, its
purpose and interfaces. A few lines on each of the main methods is
probably enough. (If you post the whole module it might be ignored
by busy people - generally the very people you want to read it!)

Don't worry about posting if you can't say when the module will be
ready - just say so in the message. It might be worth inviting
others to help you, they may be able to complete it for you!

=item  *

README and other Additional Files.

It's well known that software developers usually fully document the
software they write. If, however, the world is in urgent need of
your software and there is not enough time to write the full
documentation please at least provide a README file containing:

=over 10

=item *

A description of the module/package/extension etc.

=item *

A copyright notice - see below.

=item *

Prerequisites - what else you may need to have.

=item *

How to build it - possible changes to Makefile.PL etc.

=item *

How to install it.

=item *

Recent changes in this release, especially incompatibilities

=item *

Changes / enhancements you plan to make in the future.

=back

If the README file seems to be getting too large you may wish to
split out some of the sections into separate files: INSTALL,
Copying, ToDo etc.

=over 4

=item Adding a Copyright Notice.


How you choose to license your work is a personal decision.
The general mechanism is to assert your Copyright and then make
a declaration of how others may copy/use/modify your work.

Perl, for example, is supplied with two types of licence: The GNU
GPL and The Artistic Licence (see the files README, Copying, and
Artistic).  Larry has good reasons for NOT just using the GNU GPL.

My personal recommendation, out of respect for Larry, Perl, and the
Perl community at large is to state something simply like:

 Copyright (c) 1995 Your Name. All rights reserved.
 This program is free software; you can redistribute it and/or
 modify it under the same terms as Perl itself.

This statement should at least appear in the README file. You may
also wish to include it in a Copying file and your source files.
Remember to include the other words in addition to the Copyright.

=item  *

Give the module a version/issue/release number.

To be fully compatible with the Exporter and MakeMaker modules you
should store your module's version number in a non-my package
variable called $VERSION.  This should be a floating point
number with at least two digits after the decimal (i.e., hundredths,
e.g, C<$VERSION = "0.01">).  Don't use a "1.3.2" style version.
See L<Exporter> for details.

It may be handy to add a function or method to retrieve the number.
Use the number in announcements and archive file names when
releasing the module (ModuleName-1.02.tar.Z).
See perldoc ExtUtils::MakeMaker.pm for details.

=item  *

How to release and distribute a module.

It's good idea to post an announcement of the availability of your
module (or the module itself if small) to the comp.lang.perl.announce
Usenet newsgroup.  This will at least ensure very wide once-off
distribution.

If possible, register the module with CPAN.  You should
include details of its location in your announcement.

Some notes about ftp archives: Please use a long descriptive file
name that includes the version number. Most incoming directories
will not be readable/listable, i.e., you won't be able to see your
file after uploading it. Remember to send your email notification
message as soon as possible after uploading else your file may get
deleted automatically. Allow time for the file to be processed
and/or check the file has been processed before announcing its
location.

FTP Archives for Perl Modules:

Follow the instructions and links on:

   http://www.cpan.org/modules/00modlist.long.html
   http://www.cpan.org/modules/04pause.html

or upload to one of these sites:

   https://pause.kbx.de/pause/
   http://pause.perl.org/pause/

and notify <modules@perl.org>.

By using the WWW interface you can ask the Upload Server to mirror
your modules from your ftp or WWW site into your own directory on
CPAN!

Please remember to send me an updated entry for the Module list!

=item  *

Take care when changing a released module.

Always strive to remain compatible with previous released versions.
Otherwise try to add a mechanism to revert to the
old behavior if people rely on it.  Document incompatible changes.

=back

=back

=head2 Guidelines for Converting Perl 4 Library Scripts into Modules

=over 4

=item  *

There is no requirement to convert anything.

If it ain't broke, don't fix it! Perl 4 library scripts should
continue to work with no problems. You may need to make some minor
changes (like escaping non-array @'s in double quoted strings) but
there is no need to convert a .pl file into a Module for just that.

=item  *

Consider the implications.

All Perl applications that make use of the script will need to
be changed (slightly) if the script is converted into a module.  Is
it worth it unless you plan to make other changes at the same time?

=item  *

Make the most of the opportunity.

If you are going to convert the script to a module you can use the
opportunity to redesign the interface.  The guidelines for module
creation above include many of the issues you should consider.

=item  *

The pl2pm utility will get you started.

This utility will read *.pl files (given as parameters) and write
corresponding *.pm files. The pl2pm utilities does the following:

=over 10

=item *

Adds the standard Module prologue lines

=item *

Converts package specifiers from ' to ::

=item *

Converts die(...) to croak(...)

=item *

Several other minor changes

=back

Being a mechanical process pl2pm is not bullet proof. The converted
code will need careful checking, especially any package statements.
Don't delete the original .pl file till the new .pm one works!

=back

=head2 Guidelines for Reusing Application Code

=over 4

=item  *

Complete applications rarely belong in the Perl Module Library.

=item  *

Many applications contain some Perl code that could be reused.

Help save the world! Share your code in a form that makes it easy
to reuse.

=item  *

Break-out the reusable code into one or more separate module files.

=item  *

Take the opportunity to reconsider and redesign the interfaces.

=item  *

In some cases the 'application' can then be reduced to a small

fragment of code built on top of the reusable modules. In these cases
the application could invoked as:

     % perl -e 'use Module::Name; method(@ARGV)' ...
or
     % perl -mModule::Name ...    (in perl5.002 or higher)

=back

=head1 NOTE

Perl does not enforce private and public parts of its modules as you may
have been used to in other languages like C++, Ada, or Modula-17.  Perl
doesn't have an infatuation with enforced privacy.  It would prefer
that you stayed out of its living room because you weren't invited, not
because it has a shotgun.

The module and its user have a contract, part of which is common law,
and part of which is "written".  Part of the common law contract is
that a module doesn't pollute any namespace it wasn't asked to.  The
written contract for the module (A.K.A. documentation) may make other
provisions.  But then you know when you C<use RedefineTheWorld> that
you're redefining the world and willing to take the consequences.
EOF

close MANIFEST or warn "$0: failed to close MANIFEST (../MANIFEST): $!";
close OUT      or warn "$0: failed to close OUT (perlmodlib.tmp): $!";

ta)	pod:pod/macperldelta.pod
MacPerl built-in routines	pod:lib/MacPerl.pm
Macintosh Toolbox Modules	!
Overview	pod:lib/Mac/Toolbox.pod
-(	
MacOS Types	pod:lib/Mac/Types.pm
Event Manager	pod:lib/Mac/Events.pm
Window Manager	pod:lib/Mac/Windows.pm
Window Panes	pod:lib/Mac/Pane.pm
QuickDraw	pod:lib/Mac/QuickDraw.pm
Offscreen                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                # Generated by perlmodlib.PL  DO NOT EDIT!

=head1 NAME

perlmodlib - constructing new Perl modules and finding existing ones

=head1 DESCRIPTION

=head1 THE PERL MODULE LIBRARY

Many modules are included the Perl distribution.  These are described
below, and all end in F<.pm>.  You may discover compiled library
file (usually ending in F<.so>) or small pieces of modules to be
autoloaded (ending in F<.al>); these were automatically generated
by the installation process.  You may also discover files in the
library directory that end in either F<.pl> or F<.ph>.  These are
old libraries supplied so that old programs that use them still
run.  The F<.pl> files will all eventually be converted into standard
modules, and the F<.ph> files made by B<h2ph> will probably end up
as extension modules made by B<h2xs>.  (Some F<.ph> values may
already be available through the POSIX, Errno, or Fcntl modules.)
The B<pl2pm> file in the distribution may help in your conversion,
but it's just a mechanical process and therefore far from bulletproof.

=head2 Pragmatic Modules

They work somewhat like compiler directives (pragmata) in that they
tend to affect the compilation of your program, and thus will usually
work well only when used within a C<use>, or C<no>.  Most of these
are lexically scoped, so an inner BLOCK may countermand them
by saying:

    no integer;
    no strict 'refs';
    no warnings;

which lasts until the end of that BLOCK.

Some pragmas are lexically scoped--typically those that affect the
C<$^H> hints variable.  Others affect the current package instead,
like C<use vars> and C<use subs>, which allow you to predeclare a
variables or subroutines within a particular I<file> rather than
just a block.  Such declarations are effective for the entire file
for which they were declared.  You cannot rescind them with C<no
vars> or C<no subs>.

The following pragmas are defined (and have their own documentation).

=over 12

=item attributes

Get/set subroutine or variable attributes

=item attrs

Set/get attributes of a subroutine (deprecated)

=item autouse

Postpone load of modules until a function is used

=item base

Establish IS-A relationship with base class at compile time

=item blib

Use MakeMaker's uninstalled version of a package

=item bytes

Force byte semantics rather than character semantics

=item charnames

Define character names for C<\N{named}> string literal escape.

=item constant

Declare constants

=item diagnostics

Perl compiler pragma to force verbose warning diagnostics

=item fields

Compile-time class fields

=item filetest

Control the filetest permission operators

=item integer

Use integer arithmetic instead of floating point

=item less

Request less of something from the compiler

=item lib

Manipulate @INC at compile time

=item locale

Use and avoid POSIX locales for built-in operations

=item open

Set default disciplines for input and output

=item ops

Restrict unsafe operations when compiling

=item overload

Package for overloading perl operations

=item re

Alter regular expression behaviour

=item sigtrap

Enable simple signal handling

=item strict

Restrict unsafe constructs

=item subs

Predeclare sub names

=item utf8

Enable/disable UTF-8 in source code

=item vars

Predeclare global variable names (obsolete)

=item warnings

Control optional warnings

=item warnings::register

Warnings import function

=back

=head2 Standard Modules

Standard, bundled modules are all expected to behave in a well-defined
manner with respect to namespace pollution because they use the
Exporter module.  See their own documentation for details.

=over 12

=item AnyDBM_File

Provide framework for multiple DBMs

=item AutoLoader

Load subroutines only on demand

=item AutoSplit

Split a package for autoloading

=item B

The Perl Compiler

=item B::Asmdata

Autogenerated data about Perl ops, used to generate bytecode

=item B::Assembler

Assemble Perl bytecode

=item B::Bblock

Walk basic blocks

=item B::Bytecode

Perl compiler's bytecode backend

=item B::C

Perl compiler's C backend

=item B::CC

Perl compiler's optimized C translation backend

=item B::Concise

Walk Perl syntax tree, printing concise info about ops

=item B::Debug

Walk Perl syntax tree, printing debug info about ops

=item B::Deparse

Perl compiler backend to produce perl code

=item B::Disassembler

Disassemble Perl bytecode

=item B::Lint

Perl lint

=item B::Showlex

Show lexical variables used in functions or files

=item B::Stackobj

Helper module for CC backend

=item B::Stash

Show what stashes are loaded

=item B::Terse

Walk Perl syntax tree, printing terse info about ops

=item B::Xref

Generates cross reference reports for Perl programs

=item Benchmark

Benchmark running times of Perl code

=item ByteLoader

Load byte compiled perl code

=item CGI

Simple Common Gateway Interface Class

=item CGI::Apache

Backward compatibility module for CGI.pm

=item CGI::Carp

CGI routines for writing to the HTTPD (or other) error log

=item CGI::Cookie

Interface to Netscape Cookies

=item CGI::Fast

CGI Interface for Fast CGI

=item CGI::Pretty

Module to produce nicely formatted HTML code

=item CGI::Push

Simple Interface to Server Push

=item CGI::Switch

Backward compatibility module for defunct CGI::Switch

=item CGI::Util

Internal utilities used by CGI module

=item CPAN

Query, download and build perl modules from CPAN sites

=item CPAN::FirstTime

Utility for CPAN::Config file Initialization

=item CPAN::Nox

Wrapper around CPAN.pm without using any XS module

=item Carp

Warn of errors (from perspective of caller)

=item Carp::Heavy

Carp guts

=item Class::Struct

Declare struct-like datatypes as Perl classes

=item Cwd

Get pathname of current working directory

=item DB

Programmatic interface to the Perl debugging API (draft, subject to

=item DB_File

Perl5 access to Berkeley DB version 1.x

=item Devel::SelfStubber

Generate stubs for a SelfLoading module

=item DirHandle

Supply object methods for directory handles

=item Dumpvalue

Provides screen dump of Perl data.

=item English

Use nice English (or awk) names for ugly punctuation variables

=item Env

Perl module that imports environment variables as scalars or arrays

=item Exporter

Implements default import method for modules

=item Exporter::Heavy

Exporter guts

=item ExtUtils::Command

Utilities to replace common UNIX commands in Makefiles etc.

=item ExtUtils::Embed

Utilities for embedding Perl in C/C++ applications

=item ExtUtils::Install

Install files from here to there

=item ExtUtils::Installed

Inventory management of installed modules

=item ExtUtils::Liblist

Determine libraries to use and how to use them

=item ExtUtils::MM_Cygwin

Methods to override UN*X behaviour in ExtUtils::MakeMaker

=item ExtUtils::MM_OS2

Methods to override UN*X behaviour in ExtUtils::MakeMaker

=item ExtUtils::MM_Unix

Methods used by ExtUtils::MakeMaker

=item ExtUtils::MM_VMS

Methods to override UN*X behaviour in ExtUtils::MakeMaker

=item ExtUtils::MM_Win32

Methods to override UN*X behaviour in ExtUtils::MakeMaker

=item ExtUtils::MakeMaker

Create an extension Makefile

=item ExtUtils::Manifest

Utilities to write and check a MANIFEST file

=item ExtUtils::Mkbootstrap

Make a bootstrap file for use by DynaLoader

=item ExtUtils::Mksymlists

Write linker options files for dynamic extension

=item ExtUtils::Packlist

Manage .packlist files

=item ExtUtils::testlib

Add blib/* directories to @INC

=item Fatal

Replace functions with equivalents which succeed or die

=item Fcntl

Load the C Fcntl.h defines

=item File::Basename

Split a pathname into pieces

=item File::CheckTree

Run many filetest checks on a tree

=item File::Compare

Compare files or filehandles

=item File::Copy

Copy files or filehandles

=item File::DosGlob

DOS like globbing and then some

=item File::Find

Traverse a file tree

=item File::Path

Create or remove directory trees

=item File::Spec

Portably perform operations on file names

=item File::Spec::Epoc

Methods for Epoc file specs

=item File::Spec::Functions

Portably perform operations on file names

=item File::Spec::Mac

File::Spec for MacOS

=item File::Spec::OS2

Methods for OS/2 file specs

=item File::Spec::Unix

Methods used by File::Spec

=item File::Spec::VMS

Methods for VMS file specs

=item File::Spec::Win32

Methods for Win32 file specs

=item File::Temp

Return name and handle of a temporary file safely

=item File::stat

By-name interface to Perl's built-in stat() functions

=item FileCache

Keep more files open than the system permits

=item FileHandle

Supply object methods for filehandles

=item FindBin

Locate directory of original perl script

=item GDBM_File

Perl5 access to the gdbm library.

=item Getopt::Long

Extended processing of command line options

=item Getopt::Std

Process single-character switches with switch clustering

=item I18N::Collate

Compare 8-bit scalar data according to the current locale

=item IO

Load various IO modules

=item IPC::Open2

Open a process for both reading and writing

=item IPC::Open3

Open a process for reading, writing, and error handling

=item Math::BigFloat

Arbitrary length float math package

=item Math::BigInt

Arbitrary size integer math package

=item Math::Complex

Complex numbers and associated mathematical functions

=item Math::Trig

Trigonometric functions

=item Net::Ping

Check a remote host for reachability

=item Net::hostent

By-name interface to Perl's built-in gethost*() functions

=item Net::netent

By-name interface to Perl's built-in getnet*() functions

=item Net::protoent

By-name interface to Perl's built-in getproto*() functions

=item Net::servent

By-name interface to Perl's built-in getserv*() functions

=item O

Generic interface to Perl Compiler backends

=item Opcode

Disable named opcodes when compiling perl code

=item POSIX

Perl interface to IEEE Std 1003.1

=item Pod::Checker

Check pod documents for syntax errors

=item Pod::Find

Find POD documents in directory trees

=item Pod::Html

Module to convert pod files to HTML

=item Pod::InputObjects

Objects representing POD input paragraphs, commands, etc.

=item Pod::LaTeX

Convert Pod data to formatted Latex

=item Pod::Man

Convert POD data to formatted *roff input

=item Pod::ParseUtils

Helpers for POD parsing and conversion

=item Pod::Parser

Base class for creating POD filters and translators

=item Pod::Plainer

Perl extension for converting Pod to old style Pod.

=item Pod::Select

Extract selected sections of POD from input

=item Pod::Text

Convert POD data to formatted ASCII text

=item Pod::Text::Color

Convert POD data to formatted color ASCII text

=item Pod::Text::Overstrike

Convert POD data to formatted overstrike text

=item Pod::Text::Termcap

Convert POD data to ASCII text with format escapes

=item Pod::Usage

Print a usage message from embedded pod documentation

=item SDBM_File

Tied access to sdbm files

=item Safe

Compile and execute code in restricted compartments

=item Search::Dict

Search for key in dictionary file

=item SelectSaver

Save and restore selected file handle

=item SelfLoader

Load functions only on demand

=item Shell

Run shell commands transparently within perl

=item Socket

Load the C socket.h defines and structure manipulators 

=item Symbol

Manipulate Perl symbols and their names

=item Term::ANSIColor

Color screen output using ANSI escape sequences

=item Term::Cap

Perl termcap interface

=item Term::Complete

Perl word completion module

=item Term::ReadLine

Perl interface to various C<readline> packages. If

=item Test

Provides a simple framework for writing test scripts

=item Test::Harness

Run perl standard test scripts with statistics

=item Text::Abbrev

Create an abbreviation table from a list

=item Text::ParseWords

Parse text into an array of tokens or array of arrays

=item Text::Soundex

Implementation of the Soundex Algorithm as Described by Knuth

=item Text::Tabs

Expand and unexpand tabs per the unix expand(1) and unexpand(1)

=item Text::Wrap

Line wrapping to form simple paragraphs

=item Thread

Manipulate threads in Perl (EXPERIMENTAL, subject to change)

=item Thread::Queue

Thread-safe queues

=item Thread::Semaphore

Thread-safe semaphores

=item Thread::Signal

Start a thread which runs signal handlers reliably

=item Thread::Specific

Thread-specific keys

=item Tie::Array

Base class for tied arrays

=item Tie::Handle

Base class definitions for tied handles

=item Tie::Hash

Base class definitions for tied hashes

=item Tie::RefHash

Use references as hash keys

=item Tie::Scalar

Base class definitions for tied scalars

=item Tie::SubstrHash

Fixed-table-size, fixed-key-length hashing

=item Time::Local

Efficiently compute time from local and GMT time

=item Time::gmtime

By-name interface to Perl's built-in gmtime() function

=item Time::localtime

By-name interface to Perl's built-in localtime() function

=item Time::tm

Internal object used by Time::gmtime and Time::localtime

=item UNIVERSAL

Base class for ALL classes (blessed references)

=item User::grent

By-name interface to Perl's built-in getgr*() functions

=item User::pwent

By-name interface to Perl's built-in getpw*() functions

=item Win32

Interfaces to some Win32 API Functions

=back

To find out I<all> modules installed on your system, including
those without documentation or outside the standard release,
just do this:

    % find `perl -e 'print "@INC"'` -name '*.pm' -print

They should all have their own documentation installed and accessible
via your system man(1) command.  If you do not have a B<find>
program, you can use the Perl B<find2perl> program instead, which
generates Perl code as output you can run through perl.  If you
have a B<man> program but it doesn't find your modules, you'll have
to fix your manpath.  See L<perl> for details.  If you have no
system B<man> command, you might try the B<perldoc> program.

=head2 Extension Modules

Extension modules are written in C (or a mix of Perl and C).  They
are usually dynamically loaded into Perl if and when you need them,
but may also be be linked in statically.  Supported extension modules
include Socket, Fcntl, and POSIX.

Many popular C extension modules do not come bundled (at least, not
completely) due to their sizes, volatility, or simply lack of time
for adequate testing and configuration across the multitude of
platforms on which Perl was beta-tested.  You are encouraged to
look for them on CPAN (described below), or using web search engines
like Alta Vista or Deja News.

=head1 CPAN

CPAN stands for Comprehensive Perl Archive Network; it's a globally
replicated trove of Perl materials, including documentation, style
guides, tricks and traps, alternate ports to non-Unix systems and
occasional binary distributions for these.   Search engines for
CPAN can be found at http://cpan.perl.com/ and at
http://theory.uwinnipeg.ca/mod_perl/cpan-search.pl .

Most importantly, CPAN includes around a thousand unbundled modules,
some of which require a C compiler to build.  Major categories of
modules are:

=over

=item *

Language Extensions and Documentation Tools

=item *

Development Support

=item *

Operating System Interfaces

=item *

Networking, Device Control (modems) and InterProcess Communication

=item *

Data Types and Data Type Utilities

=item *

Database Interfaces

=item *

User Interfaces

=item *

Interfaces to / Emulations of Other Programming Languages

=item *

File Names, File Systems and File Locking (see also File Handles)

=item *

String Processing, Language Text Processing, Parsing, and Searching

=item *

Option, Argument, Parameter, and Configuration File Processing

=item *

Internationalization and Locale

=item *

Authentication, Security, and Encryption

=item *

World Wide Web, HTML, HTTP, CGI, MIME

=item *

Server and Daemon Utilities

=item *

Archiving and Compression

=item *

Images, Pixmap and Bitmap Manipulation, Drawing, and Graphing

=item *

Mail and Usenet News

=item *

Control Flow Utilities (callbacks and exceptions etc)

=item *

File Handle and Input/Output Stream Utilities

=item *

Miscellaneous Modules

=back

Registered CPAN sites as of this writing include the following.
You should try to choose one close to you:

=head2 Africa

=over 4

=item *

South Africa

    ftp://ftp.is.co.za/programming/perl/CPAN/
    ftp://ftp.saix.net/pub/CPAN/
    ftp://ftpza.co.za/pub/mirrors/cpan/
    ftp://ftp.sun.ac.za/CPAN/

=back

=head2 Asia

=over 4

=item *

China

    ftp://freesoft.cei.gov.cn/pub/languages/perl/CPAN/
    http://www2.linuxforum.net/mirror/CPAN/
    http://cpan.shellhung.org/
    ftp://ftp.shellhung.org/pub/CPAN

=item *

Hong Kong

    http://CPAN.pacific.net.hk/
    ftp://ftp.pacific.net.hk/pub/mirror/CPAN/

=item *

Indonesia

    http://piksi.itb.ac.id/CPAN/
    ftp://mirrors.piksi.itb.ac.id/CPAN/
    http://CPAN.mweb.co.id/
    ftp://ftp.mweb.co.id/pub/languages/perl/CPAN/

=item *

Israel

    http://www.iglu.org.il:/pub/CPAN/
    ftp://ftp.iglu.org.il/pub/CPAN/
    http://bioinfo.weizmann.ac.il/pub/software/perl/CPAN/
    ftp://bioinfo.weizmann.ac.il/pub/software/perl/CPAN/

=item *

Japan

    ftp://ftp.u-aizu.ac.jp/pub/lang/perl/CPAN/
    ftp://ftp.kddlabs.co.jp/CPAN/
    http://mirror.nucba.ac.jp/mirror/Perl/
    ftp://mirror.nucba.ac.jp/mirror/Perl/
    ftp://ftp.meisei-u.ac.jp/pub/CPAN/
    ftp://ftp.jaist.ac.jp/pub/lang/perl/CPAN/
    ftp://ftp.dti.ad.jp/pub/lang/CPAN/
    ftp://ftp.ring.gr.jp/pub/lang/perl/CPAN/

=item *

Saudi Arabia

    ftp://ftp.isu.net.sa/pub/CPAN/

=item *

Singapore

    http://cpan.hjc.edu.sg
    http://ftp.nus.edu.sg/unix/perl/CPAN/
    ftp://ftp.nus.edu.sg/pub/unix/perl/CPAN/

=item *

South Korea

    http://CPAN.bora.net/
    ftp://ftp.bora.net/pub/CPAN/
    http://ftp.kornet.net/CPAN/
    ftp://ftp.kornet.net/pub/CPAN/
    ftp://ftp.nuri.net/pub/CPAN/

=item *

Taiwan

    ftp://coda.nctu.edu.tw/UNIX/perl/CPAN
    ftp://ftp.ee.ncku.edu.tw/pub/perl/CPAN/
    ftp://ftp1.sinica.edu.tw/pub1/perl/CPAN/

=item *

Thailand

    http://download.nectec.or.th/CPAN/
    ftp://ftp.nectec.or.th/pub/languages/CPAN/
    ftp://ftp.cs.riubon.ac.th/pub/mirrors/CPAN/

=back

=head2 Central America

=over 4

=item *

Costa Rica

    ftp://ftp.linux.co.cr/mirrors/CPAN/
    http://ftp.ucr.ac.cr/Unix/CPAN/
    ftp://ftp.ucr.ac.cr/pub/Unix/CPAN/

=back

=head2 Europe

=over 4

=item *

Austria

    ftp://ftp.tuwien.ac.at/pub/languages/perl/CPAN/

=item *

Belgium

    http://ftp.easynet.be/CPAN/
    ftp://ftp.easynet.be/CPAN/
    ftp://ftp.kulnet.kuleuven.ac.be/pub/mirror/CPAN/

=item *

Bulgaria

    ftp://ftp.ntrl.net/pub/mirrors/CPAN/

=item *

Croatia

    ftp://ftp.linux.hr/pub/CPAN/

=item *

Czech Republic

    http://www.fi.muni.cz/pub/perl/
    ftp://ftp.fi.muni.cz/pub/perl/
    ftp://sunsite.mff.cuni.cz/MIRRORS/ftp.funet.fi/pub/languages/perl/CPAN/

=item *

Denmark

    ftp://sunsite.auc.dk/pub/languages/perl/CPAN/
    http://www.cpan.dk/CPAN/
    ftp://www.cpan.dk/ftp.cpan.org/CPAN/

=item *

England

    http://www.mirror.ac.uk/sites/ftp.funet.fi/pub/languages/perl/CPAN
    ftp://ftp.mirror.ac.uk/sites/ftp.funet.fi/pub/languages/perl/CPAN/
    ftp://ftp.demon.co.uk/pub/mirrors/perl/CPAN/
    ftp://ftp.flirble.org/pub/languages/perl/CPAN/
    ftp://ftp.plig.org/pub/CPAN/
    ftp://sunsite.doc.ic.ac.uk/packages/CPAN/
    http://mirror.uklinux.net/CPAN/
    ftp://mirror.uklinux.net/pub/CPAN/
    ftp://usit.shef.ac.uk/pub/packages/CPAN/

=item *

Estonia

    ftp://ftp.ut.ee/pub/languages/perl/CPAN/

=item *

Finland

    ftp://ftp.funet.fi/pub/languages/perl/CPAN/

=item *

France

    ftp://cpan.ftp.worldonline.fr/pub/CPAN/
    ftp://ftp.club-internet.fr/pub/perl/CPAN/
    ftp://ftp.lip6.fr/pub/perl/CPAN/
    ftp://ftp.oleane.net/pub/mirrors/CPAN/
    ftp://ftp.pasteur.fr/pub/computing/CPAN/
    ftp://cpan.cict.fr/pub/CPAN/
    ftp://ftp.uvsq.fr/pub/perl/CPAN/

=item *

Germany

    ftp://ftp.rz.ruhr-uni-bochum.de/pub/CPAN/
    ftp://ftp.freenet.de/pub/ftp.cpan.org/pub/CPAN/
    ftp://ftp.uni-erlangen.de/pub/source/CPAN/
    ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/CPAN
    ftp://ftp.gigabell.net/pub/CPAN/
    http://ftp.gwdg.de/pub/languages/perl/CPAN/
    ftp://ftp.gwdg.de/pub/languages/perl/CPAN/
    ftp://ftp.uni-hamburg.de/pub/soft/lang/perl/CPAN/
    ftp://ftp.leo.org/pub/comp/general/programming/languages/script/perl/CPAN/
    ftp://ftp.mpi-sb.mpg.de/pub/perl/CPAN/
    ftp://ftp.gmd.de/mirrors/CPAN/

=item *

Greece

    ftp://ftp.forthnet.gr/pub/languages/perl/CPAN
    ftp://ftp.ntua.gr/pub/lang/perl/

=item *

Hungary

    http://cpan.artifact.hu/
    ftp://cpan.artifact.hu/CPAN/
    ftp://ftp.kfki.hu/pub/packages/perl/CPAN/

=item *

Iceland

    http://cpan.gm.is/
    ftp://ftp.gm.is/pub/CPAN/

=item *

Ireland

    http://cpan.indigo.ie/
    ftp://cpan.indigo.ie/pub/CPAN/
    http://sunsite.compapp.dcu.ie/pub/perl/
    ftp://sunsite.compapp.dcu.ie/pub/perl/

=item *

Italy

    http://cpan.nettuno.it/
    http://gusp.dyndns.org/CPAN/
    ftp://gusp.dyndns.org/pub/CPAN
    http://softcity.iol.it/cpan
    ftp://softcity.iol.it/pub/cpan
    ftp://ftp.unina.it/pub/Other/CPAN/
    ftp://ftp.unipi.it/pub/mirror/perl/CPAN/
    ftp://cis.uniRoma2.it/CPAN/
    ftp://ftp.edisontel.it/pub/CPAN_Mirror/
    ftp://ftp.flashnet.it/pub/CPAN/

=item *

Latvia

    http://kvin.lv/pub/CPAN/

=item *

Netherlands

    ftp://download.xs4all.nl/pub/mirror/CPAN/
    ftp://ftp.nl.uu.net/pub/CPAN/
    ftp://ftp.nluug.nl/pub/languages/perl/CPAN/
    ftp://ftp.cpan.nl/pub/CPAN/
    http://www.cs.uu.nl/mirror/CPAN/
    ftp://ftp.cs.uu.nl/mirror/CPAN/

=item *

Norway

    ftp://sunsite.uio.no/pub/languages/perl/CPAN/
    ftp://ftp.uit.no/pub/languages/perl/cpan/

=item *

Poland

    ftp://ftp.pk.edu.pl/pub/lang/perl/CPAN/
    ftp://ftp.mega.net.pl/pub/mirrors/ftp.perl.com/
    ftp://ftp.man.torun.pl/pub/doc/CPAN/
    ftp://sunsite.icm.edu.pl/pub/CPAN/

=item *

Portugal

    ftp://ftp.ua.pt/pub/CPAN/
    ftp://perl.di.uminho.pt/pub/CPAN/
    ftp://ftp.ist.utl.pt/pub/CPAN/
    ftp://ftp.netc.pt/pub/CPAN/

=item *

Romania

    ftp://archive.logicnet.ro/mirrors/ftp.cpan.org/CPAN/
    ftp://ftp.kappa.ro/pub/mirrors/ftp.perl.org/pub/CPAN/
    ftp://ftp.dntis.ro/pub/cpan/
    ftp://ftp.opsynet.com/cpan/
    ftp://ftp.dnttm.ro/pub/CPAN/
    ftp://ftp.timisoara.roedu.net/mirrors/CPAN/

=item *

Russia

    ftp://ftp.chg.ru/pub/lang/perl/CPAN/
    http://cpan.rinet.ru/
    ftp://cpan.rinet.ru/pub/mirror/CPAN/
    ftp://ftp.aha.ru/pub/CPAN/
    ftp://ftp.sai.msu.su/pub/lang/perl/CPAN/

=item *

Slovakia

    ftp://ftp.entry.sk/pub/languages/perl/CPAN/

=item *

Slovenia

    ftp://ftp.arnes.si/software/perl/CPAN/

=item *

Spain

    ftp://ftp.rediris.es/mirror/CPAN/
    ftp://ftp.etse.urv.es/pub/perl/

=item *

Sweden

    http://ftp.du.se/CPAN/
    ftp://ftp.du.se/pub/CPAN/
    ftp://ftp.sunet.se/pub/lang/perl/CPAN/

=item *

Switzerland

    ftp://ftp.danyk.ch/CPAN/
    ftp://sunsite.cnlab-switch.ch/mirror/CPAN/

=item *

Turkey

    ftp://sunsite.bilkent.edu.tr/pub/languages/CPAN/

=back

=head2 North America

=over 4

=item *

Canada

=over 8

=item *

Alberta

    http://sunsite.ualberta.ca/pub/Mirror/CPAN/
    ftp://sunsite.ualberta.ca/pub/Mirror/CPAN/

=item *

Manitoba

    http://theoryx5.uwinnipeg.ca/pub/CPAN/
    ftp://theoryx5.uwinnipeg.ca/pub/CPAN/

=item *

Nova Scotia

    ftp://cpan.chebucto.ns.ca/pub/CPAN/

=item *

Ontario

    ftp://ftp.crc.ca/pub/packages/lang/perl/CPAN/

=item *

Mexico

    http://www.msg.com.mx/CPAN/
    ftp://ftp.msg.com.mx/pub/CPAN/

=back

=item *

United States

=over 8

=item *

Alabama

    http://mirror.hiwaay.net/CPAN/
    ftp://mirror.hiwaay.net/CPAN/

=item *

California

    http://www.cpan.org/
    ftp://ftp.cpan.org/CPAN/
    ftp://cpan.nas.nasa.gov/pub/perl/CPAN/
    ftp://ftp.digital.com/pub/plan/perl/CPAN/
    http://www.kernel.org/pub/mirrors/cpan/
    ftp://ftp.kernel.org/pub/mirrors/cpan/
    http://www.perl.com/CPAN/
    http://download.sourceforge.net/mirrors/CPAN/

=item *

Colorado

    ftp://ftp.cs.colorado.edu/pub/perl/CPAN/

=item *

Florida

    ftp://ftp.cise.ufl.edu/pub/perl/CPAN/

=item *

Georgia

    ftp://ftp.twoguys.org/CPAN/

=item *

Illinois

    http://www.neurogames.com/mirrors/CPAN
    http://uiarchive.uiuc.edu/mirrors/ftp/ftp.cpan.org/pub/CPAN/
    ftp://uiarchive.uiuc.edu/mirrors/ftp/ftp.cpan.org/pub/CPAN/

=item *

Indiana

    ftp://ftp.uwsg.indiana.edu/pub/perl/CPAN/
    http://cpan.nitco.com/
    ftp://cpan.nitco.com/pub/CPAN/
    ftp://cpan.in-span.net/
    http://csociety-ftp.ecn.purdue.edu/pub/CPAN
    ftp://csociety-ftp.ecn.purdue.edu/pub/CPAN

=item *

Kentucky

    http://cpan.uky.edu/
    ftp://cpan.uky.edu/pub/CPAN/

=item *

Massachusetts

    ftp://ftp.ccs.neu.edu/net/mirrors/ftp.funet.fi/pub/languages/perl/CPAN/
    ftp://ftp.iguide.com/pub/mirrors/packages/perl/CPAN/

=item *

New Jersey

    ftp://ftp.cpanel.net/pub/CPAN/

=item *

New York

    ftp://ftp.freesoftware.com/pub/perl/CPAN/
    http://www.deao.net/mirrors/CPAN/
    ftp://ftp.deao.net/pub/CPAN/
    ftp://ftp.stealth.net/pub/mirrors/ftp.cpan.org/pub/CPAN/
    http://mirror.nyc.anidea.com/CPAN/
    ftp://mirror.nyc.anidea.com/pub/CPAN/
    http://www.rge.com/pub/languages/perl/
    ftp://ftp.rge.com/pub/languages/perl/
    ftp://mirrors.cloud9.net/pub/mirrors/CPAN/

=item *

North Carolina

    ftp://ftp.duke.edu/pub/perl/

=item *

Ohio

    ftp://ftp.loaded.net/pub/CPAN/

=item *

Oklahoma

    ftp://ftp.ou.edu/mirrors/CPAN/

=item *

Oregon

    ftp://ftp.orst.edu/pub/packages/CPAN/

=item *

Pennsylvania

    http://ftp.epix.net/CPAN/
    ftp://ftp.epix.net/pub/languages/perl/
    ftp://carroll.cac.psu.edu/pub/CPAN/

=item *

Tennessee

    ftp://ftp.sunsite.utk.edu/pub/CPAN/

=item *

Texas

    http://ftp.sedl.org/pub/mirrors/CPAN/
    http://jhcloos.com/pub/mirror/CPAN/
    ftp://jhcloos.com/pub/mirror/CPAN/

=item *

Utah

    ftp://mirror.xmission.com/CPAN/

=item *

Virginia

    http://mirrors.rcn.net/pub/lang/CPAN/
    ftp://mirrors.rcn.net/pub/lang/CPAN/
    ftp://ruff.cs.jmu.edu/pub/CPAN/
    http://perl.Liquidation.com/CPAN/

=item *

Washington

    http://cpan.llarian.net/
    ftp://cpan.llarian.net/pub/CPAN/
    ftp://ftp-mirror.internap.com/pub/CPAN/
    ftp://ftp.spu.edu/pub/CPAN/

=back

=back

=head2 Oceania

=over 4

=item *

Australia

    http://ftp.planetmirror.com/pub/CPAN/
    ftp://ftp.planetmirror.com/pub/CPAN/
    ftp://mirror.aarnet.edu.au/pub/perl/CPAN/
    ftp://cpan.topend.com.au/pub/CPAN/

=item *

New Zealand

    ftp://ftp.auckland.ac.nz/pub/perl/CPAN/

=back

=head2 South America

=over 4

=item *

Argentina

    ftp://mirrors.bannerlandia.com.ar/mirrors/CPAN/

=item *

Brazil

    ftp://cpan.pop-mg.com.br/pub/CPAN/
    ftp://ftp.matrix.com.br/pub/perl/
    ftp://cpan.if.usp.br/pub/mirror/CPAN/

=item *

Chile

    ftp://ftp.psinet.cl/pub/programming/perl/CPAN/
    ftp://sunsite.dcc.uchile.cl/pub/lang/perl/

=back

For an up-to-date listing of CPAN sites,
see http://www.cpan.org/SITES or ftp://www.cpan.org/SITES .

=head1 Modules: Creation, Use, and Abuse

(The following section is borrowed directly from Tim Bunce's modules
file, available at your nearest CPAN site.)

Perl implements a class using a package, but the presence of a
package doesn't imply the presence of a class.  A package is just a
namespace.  A class is a package that provides subroutines that can be
used as methods.  A method is just a subroutine that expects, as its
first argument, either the name of a package (for "static" methods),
or a reference to something (for "virtual" methods).

A module is a file that (by convention) provides a class of the same
name (sans the .pm), plus an import method in that class that can be
called to fetch exported symbols.  This module may implement some of
its methods by loading dynamic C or C++ objects, but that should be
totally transparent to the user of the module.  Likewise, the module
might set up an AUTOLOAD function to slurp in subroutine definitions on
demand, but this is also transparent.  Only the F<.pm> file is required to
exist.  See L<perlsub>, L<perltoot>, and L<AutoLoader> for details about
the AUTOLOAD mechanism.

=head2 Guidelines for Module Creation

=over 4

=item  *

Do similar modules already exist in some form?

If so, please try to reuse the existing modules either in whole or
by inheriting useful features into a new class.  If this is not
practical try to get together with the module authors to work on
extending or enhancing the functionality of the existing modules.
A perfect example is the plethora of packages in perl4 for dealing
with command line options.

If you are writing a module to expand an already existing set of
modules, please coordinate with the author of the package.  It
helps if you follow the same naming scheme and module interaction
scheme as the original author.

=item  *

Try to design the new module to be easy to extend and reuse.

Try to C<use warnings;> (or C<use warnings qw(...);>).
Remember that you can add C<no warnings qw(...);> to individual blocks
of code that need less warnings.

Use blessed references.  Use the two argument form of bless to bless
into the class name given as the first parameter of the constructor,
e.g.,:

 sub new {
     my $class = shift;
     return bless {}, $class;
 }

or even this if you'd like it to be used as either a static
or a virtual method.

 sub new {
     my $self  = shift;
     my $class = ref($self) || $self;
     return bless {}, $class;
 }

Pass arrays as references so more parameters can be added later
(it's also faster).  Convert functions into methods where
appropriate.  Split large methods into smaller more flexible ones.
Inherit methods from other modules if appropriate.

Avoid class name tests like: C<die "Invalid" unless ref $ref eq 'FOO'>.
Generally you can delete the C<eq 'FOO'> part with no harm at all.
Let the objects look after themselves! Generally, avoid hard-wired
class names as far as possible.

Avoid C<< $r->Class::func() >> where using C<@ISA=qw(... Class ...)> and
C<< $r->func() >> would work (see L<perlbot> for more details).

Use autosplit so little used or newly added functions won't be a
burden to programs that don't use them. Add test functions to
the module after __END__ either using AutoSplit or by saying:

 eval join('',<main::DATA>) || die $@ unless caller();

Does your module pass the 'empty subclass' test? If you say
C<@SUBCLASS::ISA = qw(YOURCLASS);> your applications should be able
to use SUBCLASS in exactly the same way as YOURCLASS.  For example,
does your application still work if you change:  C<$obj = new YOURCLASS;>
into: C<$obj = new SUBCLASS;> ?

Avoid keeping any state information in your packages. It makes it
difficult for multiple other packages to use yours. Keep state
information in objects.

Always use B<-w>.

Try to C<use strict;> (or C<use strict qw(...);>).
Remember that you can add C<no strict qw(...);> to individual blocks
of code that need less strictness.

Always use B<-w>.

Follow the guidelines in the perlstyle(1) manual.

Always use B<-w>.

=item  *

Some simple style guidelines

The perlstyle manual supplied with Perl has many helpful points.

Coding style is a matter of personal taste. Many people evolve their
style over several years as they learn what helps them write and
maintain good code.  Here's one set of assorted suggestions that
seem to be widely used by experienced developers:

Use underscores to separate words.  It is generally easier to read
$var_names_like_this than $VarNamesLikeThis, especially for
non-native speakers of English. It's also a simple rule that works
consistently with VAR_NAMES_LIKE_THIS.

Package/Module names are an exception to this rule. Perl informally
reserves lowercase module names for 'pragma' modules like integer
and strict. Other modules normally begin with a capital letter and
use mixed case with no underscores (need to be short and portable).

You may find it helpful to use letter case to indicate the scope
or nature of a variable. For example:

 $ALL_CAPS_HERE   constants only (beware clashes with Perl vars)
 $Some_Caps_Here  package-wide global/static
 $no_caps_here    function scope my() or local() variables

Function and method names seem to work best as all lowercase.
e.g., C<< $obj->as_string() >>.

You can use a leading underscore to indicate that a variable or
function should not be used outside the package that defined it.

=item  *

Select what to export.

Do NOT export method names!

Do NOT export anything else by default without a good reason!

Exports pollute the namespace of the module user.  If you must
export try to use @EXPORT_OK in preference to @EXPORT and avoid
short or common names to reduce the risk of name clashes.

Generally anything not exported is still accessible from outside the
module using the ModuleName::item_name (or C<< $blessed_ref->method >>)
syntax.  By convention you can use a leading underscore on names to
indicate informally that they are 'internal' and not for public use.

(It is actually possible to get private functions by saying:
C<my $subref = sub { ... };  &$subref;>.  But there's no way to call that
directly as a method, because a method must have a name in the symbol
table.)

As a general rule, if the module is trying to be object oriented
then export nothing. If it's just a collection of functions then
@EXPORT_OK anything but use @EXPORT with caution.

=item  *

Select a name for the module.

This name should be as descriptive, accurate, and complete as
possible.  Avoid any risk of ambiguity. Always try to use two or
more whole words.  Generally the name should reflect what is special
about what the module does rather than how it does it.  Please use
nested module names to group informally or categorize a module.
There should be a very good reason for a module not to have a nested name.
Module names should begin with a capital letter.

Having 57 modules all called Sort will not make life easy for anyone
(though having 23 called Sort::Quick is only marginally better :-).
Imagine someone trying to install your module alongside many others.
If in any doubt ask for suggestions in comp.lang.perl.misc.

If you are developing a suite of related modules/classes it's good
practice to use nested classes with a common prefix as this will
avoid namespace clashes. For example: Xyz::Control, Xyz::View,
Xyz::Model etc. Use the modules in this list as a naming guide.

If adding a new module to a set, follow the original author's
standards for naming modules and the interface to methods in
those modules.

If developing modules for private internal or project specific use,
that will never be released to the public, then you should ensure
that their names will not clash with any future public module. You
can do this either by using the reserved Local::* category or by
using a category name that includes an underscore like Foo_Corp::*.

To be portable each component of a module name should be limited to
11 characters. If it might be used on MS-DOS then try to ensure each is
unique in the first 8 characters. Nested modules make this easier.

=item  *

Have you got it right?

How do you know that you've made the right decisions? Have you
picked an interface design that will cause problems later? Have
you picked the most appropriate name? Do you have any questions?

The best way to know for sure, and pick up many helpful suggestions,
is to ask someone who knows. Comp.lang.perl.misc is read by just about
all the people who develop modules and it's the best place to ask.

All you need to do is post a short summary of the module, its
purpose and interfaces. A few lines on each of the main methods is
probably enough. (If you post the whole module it might be ignored
by busy people - generally the very people you want to read it!)

Don't worry about posting if you can't say when the module will be
ready - just say so in the message. It might be worth inviting
others to help you, they may be able to complete it for you!

=item  *

README and other Additional Files.

It's well known that software developers usually fully document the
software they write. If, however, the world is in urgent need of
your software and there is not enough time to write the full
documentation please at least provide a README file containing:

=over 10

=item *

A description of the module/package/extension etc.

=item *

A copyright notice - see below.

=item *

Prerequisites - what else you may need to have.

=item *

How to build it - possible changes to Makefile.PL etc.

=item *

How to install it.

=item *

Recent changes in this release, especially incompatibilities

=item *

Changes / enhancements you plan to make in the future.

=back

If the README file seems to be getting too large you may wish to
split out some of the sections into separate files: INSTALL,
Copying, ToDo etc.

=over 4

=item Adding a Copyright Notice.


How you choose to license your work is a personal decision.
The general mechanism is to assert your Copyright and then make
a declaration of how others may copy/use/modify your work.

Perl, for example, is supplied with two types of licence: The GNU
GPL and The Artistic Licence (see the files README, Copying, and
Artistic).  Larry has good reasons for NOT just using the GNU GPL.

My personal recommendation, out of respect for Larry, Perl, and the
Perl community at large is to state something simply like:

 Copyright (c) 1995 Your Name. All rights reserved.
 This program is free software; you can redistribute it and/or
 modify it under the same terms as Perl itself.

This statement should at least appear in the README file. You may
also wish to include it in a Copying file and your source files.
Remember to include the other words in addition to the Copyright.

=item  *

Give the module a version/issue/release number.

To be fully compatible with the Exporter and MakeMaker modules you
should store your module's version number in a non-my package
variable called $VERSION.  This should be a floating point
number with at least two digits after the decimal (i.e., hundredths,
e.g, C<$VERSION = "0.01">).  Don't use a "1.3.2" style version.
See L<Exporter> for details.

It may be handy to add a function or method to retrieve the number.
Use the number in announcements and archive file names when
releasing the module (ModuleName-1.02.tar.Z).
See perldoc ExtUtils::MakeMaker.pm for details.

=item  *

How to release and distribute a module.

It's good idea to post an announcement of the availability of your
module (or the module itself if small) to the comp.lang.perl.announce
Usenet newsgroup.  This will at least ensure very wide once-off
distribution.

If possible, register the module with CPAN.  You should
include details of its location in your announcement.

Some notes about ftp archives: Please use a long descriptive file
name that includes the version number. Most incoming directories
will not be readable/listable, i.e., you won't be able to see your
file after uploading it. Remember to send your email notification
message as soon as possible after uploading else your file may get
deleted automatically. Allow time for the file to be processed
and/or check the file has been processed before announcing its
location.

FTP Archives for Perl Modules:

Follow the instructions and links on:

   http://www.cpan.org/modules/00modlist.long.html
   http://www.cpan.org/modules/04pause.html

or upload to one of these sites:

   https://pause.kbx.de/pause/
   http://pause.perl.org/pause/

and notify <modules@perl.org>.

By using the WWW interface you can ask the Upload Server to mirror
your modules from your ftp or WWW site into your own directory on
CPAN!

Please remember to send me an updated entry for the Module list!

=item  *

Take care when changing a released module.

Always strive to remain compatible with previous released versions.
Otherwise try to add a mechanism to revert to the
old behavior if people rely on it.  Document incompatible changes.

=back

=back

=head2 Guidelines for Converting Perl 4 Library Scripts into Modules

=over 4

=item  *

There is no requirement to convert anything.

If it ain't broke, don't fix it! Perl 4 library scripts should
continue to work with no problems. You may need to make some minor
changes (like escaping non-array @'s in double quoted strings) but
there is no need to convert a .pl file into a Module for just that.

=item  *

Consider the implications.

All Perl applications that make use of the script will need to
be changed (slightly) if the script is converted into a module.  Is
it worth it unless you plan to make other changes at the same time?

=item  *

Make the most of the opportunity.

If you are going to convert the script to a module you can use the
opportunity to redesign the interface.  The guidelines for module
creation above include many of the issues you should consider.

=item  *

The pl2pm utility will get you started.

This utility will read *.pl files (given as parameters) and write
corresponding *.pm files. The pl2pm utilities does the following:

=over 10

=item *

Adds the standard Module prologue lines

=item *

Converts package specifiers from ' to ::

=item *

Converts die(...) to croak(...)

=item *

Several other minor changes

=back

Being a mechanical process pl2pm is not bullet proof. The converted
code will need careful checking, especially any package statements.
Don't delete the original .pl file till the new .pm one works!

=back

=head2 Guidelines for Reusing Application Code

=over 4

=item  *

Complete applications rarely belong in the Perl Module Library.

=item  *

Many applications contain some Perl code that could be reused.

Help save the world! Share your code in a form that makes it easy
to reuse.

=item  *

Break-out the reusable code into one or more separate module files.

=item  *

Take the opportunity to reconsider and redesign the interfaces.

=item  *

In some cases the 'application' can then be reduced to a small

fragment of code built on top of the reusable modules. In these cases
the application could invoked as:

     % perl -e 'use Module::Name; method(@ARGV)' ...
or
     % perl -mModule::Name ...    (in perl5.002 or higher)

=back

=head1 NOTE

Perl does not enforce private and public parts of its modules as you may
have been used to in other languages like C++, Ada, or Modula-17.  Perl
doesn't have an infatuation with enforced privacy.  It would prefer
that you stayed out of its living room because you weren't invited, not
because it has a shotgun.

The module and its user have a contract, part of which is common law,
and part of which is "written".  Part of the common law contract is
that a module doesn't pollute any namespace it wasn't asked to.  The
written contract for the module (A.K.A. documentation) may make other
provisions.  But then you know when you C<use RedefineTheWorld> that
you're redefining the world and willing to take the consequences.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         =head1 NAME

perlnewmod - preparing a new module for distribution

=head1 DESCRIPTION

This document gives you some suggestions about how to go about writing
Perl modules, preparing them for distribution, and making them available
via CPAN.

One of the things that makes Perl really powerful is the fact that Perl
hackers tend to want to share the solutions to problems they've faced,
so you and I don't have to battle with the same problem again.

The main way they do this is by abstracting the solution into a Perl
module. If you don't know what one of these is, the rest of this
document isn't going to be much use to you. You're also missing out on
an awful lot of useful code; consider having a look at L<perlmod>,
L<perlmodlib> and L<perlmodinstall> before coming back here.

When you've found that there isn't a module available for what you're
trying to do, and you've had to write the code yourself, consider
packaging up the solution into a module and uploading it to CPAN so that
others can benefit.

=head2 Warning

We're going to primarily concentrate on Perl-only modules here, rather
than XS modules. XS modules serve a rather different purpose, and
you should consider different things before distributing them - the
popularity of the library you are gluing, the portability to other
operating systems, and so on. However, the notes on preparing the Perl
side of the module and packaging and distributing it will apply equally
well to an XS module as a pure-Perl one.

=head2 What should I make into a module?

You should make a module out of any code that you think is going to be
useful to others. Anything that's likely to fill a hole in the communal
library and which someone else can slot directly into their program. Any
part of your code which you can isolate and extract and plug into
something else is a likely candidate.

Let's take an example. Suppose you're reading in data from a local
format into a hash-of-hashes in Perl, turning that into a tree, walking
the tree and then piping each node to an Acme Transmogrifier Server.

Now, quite a few people have the Acme Transmogrifier, and you've had to
write something to talk the protocol from scratch - you'd almost
certainly want to make that into a module. The level at which you pitch
it is up to you: you might want protocol-level modules analogous to
L<Net::SMTP|Net::SMTP> which then talk to higher level modules analogous
to L<Mail::Send|Mail::Send>. The choice is yours, but you do want to get
a module out for that server protocol.

Nobody else on the planet is going to talk your local data format, so we
can ignore that. But what about the thing in the middle? Building tree
structures from Perl variables and then traversing them is a nice,
general problem, and if nobody's already written a module that does
that, you might want to modularise that code too.

So hopefully you've now got a few ideas about what's good to modularise.
Let's now see how it's done.

=head2 Step-by-step: Preparing the ground

Before we even start scraping out the code, there are a few things we'll
want to do in advance.

=over 3

=item Look around

Dig into a bunch of modules to see how they're written. I'd suggest
starting with L<Text::Tabs|Text::Tabs>, since it's in the standard
library and is nice and simple, and then looking at something like
L<Time::Zone|Time::Zone>, L<File::Copy|File::Copy> and then some of the
C<Mail::*> modules if you're planning on writing object oriented code.

These should give you an overall feel for how modules are laid out and
written.

=item Check it's new

There are a lot of modules on CPAN, and it's easy to miss one that's
similar to what you're planning on contributing. Have a good plough
through the modules list and the F<by-module> directories, and make sure
you're not the one reinventing the wheel!

=item Discuss the need

You might love it. You might feel that everyone else needs it. But there
might not actually be any real demand for it out there. If you're unsure
about the demand you're module will have, consider sending out feelers
on the C<comp.lang.perl.modules> newsgroup, or as a last resort, ask the
modules list at C<modules@perl.org>. Remember that this is a closed list
with a very long turn-around time - be prepared to wait a good while for
a response from them.

=item Choose a name

Perl modules included on CPAN have a naming hierarchy you should try to
fit in with. See L<perlmodlib> for more details on how this works, and
browse around CPAN and the modules list to get a feel of it. At the very
least, remember this: modules should be title capitalised, (This::Thing)
fit in with a category, and explain their purpose succinctly.

=item Check again

While you're doing that, make really sure you haven't missed a module
similar to the one you're about to write.

When you've got your name sorted out and you're sure that your module is
wanted and not currently available, it's time to start coding.

=back

=head2 Step-by-step: Making the module

=over 3

=item Start with F<h2xs>

Originally a utility to convert C header files into XS modules,
L<h2xs|h2xs> has become a useful utility for churning out skeletons for
Perl-only modules as well. If you don't want to use the
L<Autoloader|Autoloader> which splits up big modules into smaller
subroutine-sized chunks, you'll say something like this:

    h2xs -AX -n Net::Acme

The C<-A> omits the Autoloader code, C<-X> omits XS elements, and C<-n>
specifies the name of the module.

=item Use L<strict|strict> and L<warnings|warnings>

A module's code has to be warning and strict-clean, since you can't
guarantee the conditions that it'll be used under. Besides, you wouldn't
want to distribute code that wasn't warning or strict-clean anyway,
right?

=item Use L<Carp|Carp>

The L<Carp|Carp> module allows you to present your error messages from
the caller's perspective; this gives you a way to signal a problem with
the caller and not your module. For instance, if you say this:

    warn "No hostname given";

the user will see something like this:

    No hostname given at /usr/local/lib/perl5/site_perl/5.6.0/Net/Acme.pm
    line 123.

which looks like your module is doing something wrong. Instead, you want
to put the blame on the user, and say this:

    No hostname given at bad_code, line 10.

You do this by using L<Carp|Carp> and replacing your C<warn>s with
C<carp>s. If you need to C<die>, say C<croak> instead. However, keep
C<warn> and C<die> in place for your sanity checks - where it really is
your module at fault.

=item Use L<Exporter|Exporter> - wisely!

C<h2xs> provides stubs for L<Exporter|Exporter>, which gives you a
standard way of exporting symbols and subroutines from your module into
the caller's namespace. For instance, saying C<use Net::Acme qw(&frob)>
would import the C<frob> subroutine.

The package variable C<@EXPORT> will determine which symbols will get
exported when the caller simply says C<use Net::Acme> - you will hardly
ever want to put anything in there. C<@EXPORT_OK>, on the other hand,
specifies which symbols you're willing to export. If you do want to
export a bunch of symbols, use the C<%EXPORT_TAGS> and define a standard
export set - look at L<Exporter> for more details.

=item Use L<plain old documentation|perlpod>

The work isn't over until the paperwork is done, and you're going to
need to put in some time writing some documentation for your module.
C<h2xs> will provide a stub for you to fill in; if you're not sure about
the format, look at L<perlpod> for an introduction. Provide a good
synopsis of how your module is used in code, a description, and then
notes on the syntax and function of the individual subroutines or
methods. Use Perl comments for developer notes and POD for end-user
notes.

=item Write tests

You're encouraged to create self-tests for your module to ensure it's
working as intended on the myriad platforms Perl supports; if you upload
your module to CPAN, a host of testers will build your module and send
you the results of the tests. Again, C<h2xs> provides a test framework
which you can extend - you should do something more than just checking
your module will compile.

=item Write the README

If you're uploading to CPAN, the automated gremlins will extract the
README file and place that in your CPAN directory. It'll also appear in
the main F<by-module> and F<by-category> directories if you make it onto
the modules list. It's a good idea to put here what the module actually
does in detail, and the user-visible changes since the last release.

=back

=head2 Step-by-step: Distributing your module

=over 3

=item Get a CPAN user ID

Every developer publishing modules on CPAN needs a CPAN ID. See the
instructions at C<http://www.cpan.org/modules/04pause.html> (or
equivalent on your nearest mirror) to find out how to do this.

=item C<perl Makefile.PL; make test; make dist>

Once again, C<h2xs> has done all the work for you. It produces the
standard C<Makefile.PL> you'll have seen when you downloaded and
installs modules, and this produces a Makefile with a C<dist> target.

Once you've ensured that your module passes its own tests - always a
good thing to make sure - you can C<make dist>, and the Makefile will
hopefully produce you a nice tarball of your module, ready for upload.

=item Upload the tarball

The email you got when you received your CPAN ID will tell you how to
log in to PAUSE, the Perl Authors Upload SErver. From the menus there,
you can upload your module to CPAN.

=item Announce to the modules list

Once uploaded, it'll sit unnoticed in your author directory. If you want
it connected to the rest of the CPAN, you'll need to tell the modules
list about it. The best way to do this is to email them a line in the
style of the modules list, like this:

    Net::Acme bdpO  Interface to Acme Frobnicator servers         FOOBAR
    ^         ^^^^  ^                                             ^
    |         ||||  Module description                            Your ID
    |         ||||
    |         |||\- Interface: (O)OP, (r)eferences, (h)ybrid, (f)unctions
    |         |||
    |         ||\-- Language: (p)ure Perl, C(+)+, (h)ybrid, (C), (o)ther
    |         ||
    Module    |\--- Support: (d)eveloper, (m)ailing list, (u)senet, (n)one
    Name      |
              \---- Maturity: (i)dea, (c)onstructions, (a)lpha, (b)eta,
                              (R)eleased, (M)ature, (S)tandard

plus a description of the module and why you think it should be
included. If you hear nothing back, that means your module will
probably appear on the modules list at the next update. Don't try
subscribing to C<modules@perl.org>; it's not another mailing list. Just
have patience.

=item Announce to clpa

If you have a burning desire to tell the world about your release, post
an announcement to the moderated C<comp.lang.perl.announce> newsgroup.

=item Fix bugs!

Once you start accumulating users, they'll send you bug reports. If
you're lucky, they'll even send you patches. Welcome to the joys of
maintaining a software project...

=back

=head1 AUTHOR

Simon Cozens, C<simon@cpan.org>

=head1 SEE ALSO

L<perlmod>, L<perlmodlib>, L<perlmodinstall>, L<h2xs>, L<strict>,
L<Carp>, L<Exporter>, L<perlpod>, L<Test>, L<ExtUtils::MakeMaker>,
http://www.cpan.org/
:QNXpod:lib/URI/rlogin.pmURI::rloginpod:lib/Safe.pmSafepod:lib/O.pmOpod:lib/NEXT.pmNEXTpod:lib/lwpcook.podlwpcookpod:pod/perlcall.podperlcallpod:pod/perlfaq4.podperlfaq4pod:pod/perlipc.podperlipcpod:pod/perlfunc.podperlfuncpod:pod/macperlbook.podmacperlbookpod:pod/perlsyn.pod#Compound%20statementsifpod:pod/perlsub.pod#Temporary%20Values%20via%20local%28%29Temporary Values via local()pod:pod/perlvar.pod#%25INC%INCpod:pod/perlvar.pod#%40_@_pod:pod/perlvar.pod#%24%7B%5EWARNING_B                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlnumber - semantics of numbers and numeric operations in Perl

=head1 SYNOPSIS

    $n = 1234;			# decimal integer
    $n = 0b1110011;		# binary integer
    $n = 01234;			# octal integer
    $n = 0x1234;		# hexadecimal integer
    $n = 12.34e-56;		# exponential notation
    $n = "-12.34e56";		# number specified as a string
    $n = "1234";		# number specified as a string
    $n = v49.50.51.52;		# number specified as a string, which in
				# turn is specified in terms of numbers :-)

=head1 DESCRIPTION

This document describes how Perl internally handles numeric values.

Perl's operator overloading facility is completely ignored here.  Operator
overloading allows user-defined behaviors for numbers, such as operations
over arbitrarily large integers, floating points numbers with arbitrary
precision, operations over "exotic" numbers such as modular arithmetic or
p-adic arithmetic, and so on.  See L<overload> for details.

=head1 Storing numbers

Perl can internally represent numbers in 3 different ways: as native
integers, as native floating point numbers, and as decimal strings.
Decimal strings may have an exponential notation part, as in C<"12.34e-56">.
I<Native> here means "a format supported by the C compiler which was used
to build perl".

The term "native" does not mean quite as much when we talk about native
integers, as it does when native floating point numbers are involved.
The only implication of the term "native" on integers is that the limits for
the maximal and the minimal supported true integral quantities are close to
powers of 2.  However, "native" floats have a most fundamental
restriction: they may represent only those numbers which have a relatively
"short" representation when converted to a binary fraction.  For example,
0.9 cannot be represented by a native float, since the binary fraction
for 0.9 is infinite:

  binary0.1110011001100...

with the sequence C<1100> repeating again and again.  In addition to this
limitation,  the exponent of the binary number is also restricted when it
is represented as a floating point number.  On typical hardware, floating
point values can store numbers with up to 53 binary digits, and with binary
exponents between -1024 and 1024.  In decimal representation this is close
to 16 decimal digits and decimal exponents in the range of -304..304.
The upshot of all this is that Perl cannot store a number like
12345678901234567 as a floating point number on such architectures without
loss of information.

Similarly, decimal strings can represent only those numbers which have a
finite decimal expansion.  Being strings, and thus of arbitrary length, there
is no practical limit for the exponent or number of decimal digits for these
numbers.  (But realize that what we are discussing the rules for just the
I<storage> of these numbers.  The fact that you can store such "large" numbers
does not mean that the I<operations> over these numbers will use all
of the significant digits.
See L<"Numeric operators and numeric conversions"> for details.)

In fact numbers stored in the native integer format may be stored either
in the signed native form, or in the unsigned native form.  Thus the limits
for Perl numbers stored as native integers would typically be -2**31..2**32-1,
with appropriate modifications in the case of 64-bit integers.  Again, this
does not mean that Perl can do operations only over integers in this range:
it is possible to store many more integers in floating point format.

Summing up, Perl numeric values can store only those numbers which have
a finite decimal expansion or a "short" binary expansion.

=head1 Numeric operators and numeric conversions

As mentioned earlier, Perl can store a number in any one of three formats,
but most operators typically understand only one of those formats.  When
a numeric value is passed as an argument to such an operator, it will be
converted to the format understood by the operator.

Six such conversions are possible:

  native integer        --> native floating point	(*)
  native integer        --> decimal string
  native floating_point --> native integer		(*)
  native floating_point --> decimal string		(*)
  decimal string        --> native integer
  decimal string        --> native floating point	(*)

These conversions are governed by the following general rules:

=over 4

=item *

If the source number can be represented in the target form, that
representation is used.

=item *

If the source number is outside of the limits representable in the target form,
a representation of the closest limit is used.  (I<Loss of information>)

=item *

If the source number is between two numbers representable in the target form,
a representation of one of these numbers is used.  (I<Loss of information>)

=item *

In C<< native floating point --> native integer >> conversions the magnitude
of the result is less than or equal to the magnitude of the source.
(I<"Rounding to zero".>)

=item *

If the C<< decimal string --> native integer >> conversion cannot be done
without loss of information, the result is compatible with the conversion
sequence C<< decimal_string --> native_floating_point --> native_integer >>.
In particular, rounding is strongly biased to 0, though a number like
C<"0.99999999999999999999"> has a chance of being rounded to 1.

=back

B<RESTRICTION>: The conversions marked with C<(*)> above involve steps
performed by the C compiler.  In particular, bugs/features of the compiler
used may lead to breakage of some of the above rules.

=head1 Flavors of Perl numeric operations

Perl operations which take a numeric argument treat that argument in one
of four different ways: they may force it to one of the integer/floating/
string formats, or they may behave differently depending on the format of
the operand.  Forcing a numeric value to a particular format does not
change the number stored in the value.

All the operators which need an argument in the integer format treat the
argument as in modular arithmetic, e.g., C<mod 2**32> on a 32-bit
architecture.  C<sprintf "%u", -1> therefore provides the same result as
C<sprintf "%u", ~0>.

=over 4

=item Arithmetic operators except, C<no integer>

force the argument into the floating point format.

=item Arithmetic operators except, C<use integer>

=item Bitwise operators, C<no integer>

force the argument into the integer format if it is not a string.

=item Bitwise operators, C<use integer>

force the argument into the integer format

=item Operators which expect an integer

force the argument into the integer format.  This is applicable
to the third and fourth arguments of C<sysread>, for example.

=item Operators which expect a string

force the argument into the string format.  For example, this is
applicable to C<printf "%s", $value>.

=back

Though forcing an argument into a particular form does not change the
stored number, Perl remembers the result of such conversions.  In
particular, though the first such conversion may be time-consuming,
repeated operations will not need to redo the conversion.

=head1 AUTHOR

Ilya Zakharevich C<ilya@math.ohio-state.edu>

Editorial adjustments by Gurusamy Sarathy <gsar@ActiveState.com>

=head1 SEE ALSO

L<overload>
cvpod:pod/perlfunc.pod#join%20EXPR%2CLISTjoinpod:pod/perlfunc.pod#getprotobyname%20NAMEgetprotobynamepod:pod/perlfunc.pod#accept%20NEWSOCKET%2CGENERICSOCKETacceptpod:pod/perlfunc.pod#Input%20and%20output%20functionsInputpod:pod/perlop.pod#Quote%20and%20Quote%2Dlike%20Operators\tpod:pod/perlop.pod#%22%22%2C%20%60%60%2C%20qq%2F%2F%2C%20qx%2F%2F%2C%20%3Cfile%2Aglob%3E"",pod:pod/perlop.pod#Symbolic%20Unary%20OperatorsSymbolic Unary Operators                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlobj - Perl objects

=head1 DESCRIPTION

First you need to understand what references are in Perl.
See L<perlref> for that.  Second, if you still find the following
reference work too complicated, a tutorial on object-oriented programming
in Perl can be found in L<perltoot> and L<perltootc>.

If you're still with us, then
here are three very simple definitions that you should find reassuring.

=over 4

=item 1.

An object is simply a reference that happens to know which class it
belongs to.

=item 2.

A class is simply a package that happens to provide methods to deal
with object references.

=item 3.

A method is simply a subroutine that expects an object reference (or
a package name, for class methods) as the first argument.

=back

We'll cover these points now in more depth.

=head2 An Object is Simply a Reference

Unlike say C++, Perl doesn't provide any special syntax for
constructors.  A constructor is merely a subroutine that returns a
reference to something "blessed" into a class, generally the
class that the subroutine is defined in.  Here is a typical
constructor:

    package Critter;
    sub new { bless {} }

That word C<new> isn't special.  You could have written
a construct this way, too:

    package Critter;
    sub spawn { bless {} }

This might even be preferable, because the C++ programmers won't
be tricked into thinking that C<new> works in Perl as it does in C++.
It doesn't.  We recommend that you name your constructors whatever
makes sense in the context of the problem you're solving.  For example,
constructors in the Tk extension to Perl are named after the widgets
they create.

One thing that's different about Perl constructors compared with those in
C++ is that in Perl, they have to allocate their own memory.  (The other
things is that they don't automatically call overridden base-class
constructors.)  The C<{}> allocates an anonymous hash containing no
key/value pairs, and returns it  The bless() takes that reference and
tells the object it references that it's now a Critter, and returns
the reference.  This is for convenience, because the referenced object
itself knows that it has been blessed, and the reference to it could
have been returned directly, like this:

    sub new {
	my $self = {};
	bless $self;
	return $self;
    }

You often see such a thing in more complicated constructors
that wish to call methods in the class as part of the construction:

    sub new {
	my $self = {};
	bless $self;
	$self->initialize();
	return $self;
    }

If you care about inheritance (and you should; see
L<perlmodlib/"Modules: Creation, Use, and Abuse">),
then you want to use the two-arg form of bless
so that your constructors may be inherited:

    sub new {
	my $class = shift;
	my $self = {};
	bless $self, $class;
	$self->initialize();
	return $self;
    }

Or if you expect people to call not just C<< CLASS->new() >> but also
C<< $obj->new() >>, then use something like this.  The initialize()
method used will be of whatever $class we blessed the
object into:

    sub new {
	my $this = shift;
	my $class = ref($this) || $this;
	my $self = {};
	bless $self, $class;
	$self->initialize();
	return $self;
    }

Within the class package, the methods will typically deal with the
reference as an ordinary reference.  Outside the class package,
the reference is generally treated as an opaque value that may
be accessed only through the class's methods.

Although a constructor can in theory re-bless a referenced object
currently belonging to another class, this is almost certainly going
to get you into trouble.  The new class is responsible for all
cleanup later.  The previous blessing is forgotten, as an object
may belong to only one class at a time.  (Although of course it's
free to inherit methods from many classes.)  If you find yourself
having to do this, the parent class is probably misbehaving, though.

A clarification:  Perl objects are blessed.  References are not.  Objects
know which package they belong to.  References do not.  The bless()
function uses the reference to find the object.  Consider
the following example:

    $a = {};
    $b = $a;
    bless $a, BLAH;
    print "\$b is a ", ref($b), "\n";

This reports $b as being a BLAH, so obviously bless()
operated on the object and not on the reference.

=head2 A Class is Simply a Package

Unlike say C++, Perl doesn't provide any special syntax for class
definitions.  You use a package as a class by putting method
definitions into the class.

There is a special array within each package called @ISA, which says
where else to look for a method if you can't find it in the current
package.  This is how Perl implements inheritance.  Each element of the
@ISA array is just the name of another package that happens to be a
class package.  The classes are searched (depth first) for missing
methods in the order that they occur in @ISA.  The classes accessible
through @ISA are known as base classes of the current class.

All classes implicitly inherit from class C<UNIVERSAL> as their
last base class.  Several commonly used methods are automatically
supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for
more details.

If a missing method is found in a base class, it is cached
in the current class for efficiency.  Changing @ISA or defining new
subroutines invalidates the cache and causes Perl to do the lookup again.

If neither the current class, its named base classes, nor the UNIVERSAL
class contains the requested method, these three places are searched
all over again, this time looking for a method named AUTOLOAD().  If an
AUTOLOAD is found, this method is called on behalf of the missing method,
setting the package global $AUTOLOAD to be the fully qualified name of
the method that was intended to be called.

If none of that works, Perl finally gives up and complains.

If you want to stop the AUTOLOAD inheritance say simply

	sub AUTOLOAD;

and the call will die using the name of the sub being called.

Perl classes do method inheritance only.  Data inheritance is left up
to the class itself.  By and large, this is not a problem in Perl,
because most classes model the attributes of their object using an
anonymous hash, which serves as its own little namespace to be carved up
by the various classes that might want to do something with the object.
The only problem with this is that you can't sure that you aren't using
a piece of the hash that isn't already used.  A reasonable workaround
is to prepend your fieldname in the hash with the package name.

    sub bump {
	my $self = shift;
	$self->{ __PACKAGE__ . ".count"}++;
    } 

=head2 A Method is Simply a Subroutine

Unlike say C++, Perl doesn't provide any special syntax for method
definition.  (It does provide a little syntax for method invocation
though.  More on that later.)  A method expects its first argument
to be the object (reference) or package (string) it is being invoked
on.  There are two ways of calling methods, which we'll call class
methods and instance methods.  

A class method expects a class name as the first argument.  It
provides functionality for the class as a whole, not for any
individual object belonging to the class.  Constructors are often
class methods, but see L<perltoot> and L<perltootc> for alternatives.
Many class methods simply ignore their first argument, because they
already know what package they're in and don't care what package
they were invoked via.  (These aren't necessarily the same, because
class methods follow the inheritance tree just like ordinary instance
methods.)  Another typical use for class methods is to look up an
object by name:

    sub find {
	my ($class, $name) = @_;
	$objtable{$name};
    }

An instance method expects an object reference as its first argument.
Typically it shifts the first argument into a "self" or "this" variable,
and then uses that as an ordinary reference.

    sub display {
	my $self = shift;
	my @keys = @_ ? @_ : sort keys %$self;
	foreach $key (@keys) {
	    print "\t$key => $self->{$key}\n";
	}
    }

=head2 Method Invocation

There are two ways to invoke a method, one of which you're already
familiar with, and the other of which will look familiar.  Perl 4
already had an "indirect object" syntax that you use when you say

    print STDERR "help!!!\n";

This same syntax can be used to call either class or instance methods.
We'll use the two methods defined above, the class method to lookup
an object reference and the instance method to print out its attributes.

    $fred = find Critter "Fred";
    display $fred 'Height', 'Weight';

These could be combined into one statement by using a BLOCK in the
indirect object slot:

    display {find Critter "Fred"} 'Height', 'Weight';

For C++ fans, there's also a syntax using -> notation that does exactly
the same thing.  The parentheses are required if there are any arguments.

    $fred = Critter->find("Fred");
    $fred->display('Height', 'Weight');

or in one statement,

    Critter->find("Fred")->display('Height', 'Weight');

There are times when one syntax is more readable, and times when the
other syntax is more readable.  The indirect object syntax is less
cluttered, but it has the same ambiguity as ordinary list operators.
Indirect object method calls are usually parsed using the same rule as list
operators: "If it looks like a function, it is a function".  (Presuming
for the moment that you think two words in a row can look like a
function name.  C++ programmers seem to think so with some regularity,
especially when the first word is "new".)  Thus, the parentheses of

    new Critter ('Barney', 1.5, 70)

are assumed to surround ALL the arguments of the method call, regardless
of what comes after.  Saying

    new Critter ('Bam' x 2), 1.4, 45

would be equivalent to

    Critter->new('Bam' x 2), 1.4, 45

which is unlikely to do what you want.  Confusingly, however, this
rule applies only when the indirect object is a bareword package name,
not when it's a scalar, a BLOCK, or a C<Package::> qualified package name.
In those cases, the arguments are parsed in the same way as an
indirect object list operator like print, so

    new Critter:: ('Bam' x 2), 1.4, 45

is the same as

   Critter::->new(('Bam' x 2), 1.4, 45)

For more reasons why the indirect object syntax is ambiguous, see
L<"WARNING"> below.

There are times when you wish to specify which class's method to use.
Here you can call your method as an ordinary subroutine
call, being sure to pass the requisite first argument explicitly:

    $fred =  MyCritter::find("Critter", "Fred");
    MyCritter::display($fred, 'Height', 'Weight');

Unlike method calls, function calls don't consider inheritance.  If you wish
merely to specify that Perl should I<START> looking for a method in a
particular package, use an ordinary method call, but qualify the method
name with the package like this:

    $fred = Critter->MyCritter::find("Fred");
    $fred->MyCritter::display('Height', 'Weight');

If you're trying to control where the method search begins I<and> you're
executing in the class itself, then you may use the SUPER pseudo class,
which says to start looking in your base class's @ISA list without having
to name it explicitly:

    $self->SUPER::display('Height', 'Weight');

Please note that the C<SUPER::> construct is meaningful I<only> within the
class.

Sometimes you want to call a method when you don't know the method name
ahead of time.  You can use the arrow form, replacing the method name
with a simple scalar variable containing the method name or a
reference to the function.

    $method = $fast ? "findfirst" : "findbest";
    $fred->$method(@args);  	    # call by name

    if ($coderef = $fred->can($parent . "::findbest")) {
	$self->$coderef(@args);	    # call by coderef
    }

=head2 WARNING

While indirect object syntax may well be appealing to English speakers and
to C++ programmers, be not seduced!  It suffers from two grave problems.

The first problem is that an indirect object is limited to a name,
a scalar variable, or a block, because it would have to do too much
lookahead otherwise, just like any other postfix dereference in the
language.  (These are the same quirky rules as are used for the filehandle
slot in functions like C<print> and C<printf>.)  This can lead to horribly
confusing precedence problems, as in these next two lines:

    move $obj->{FIELD};                 # probably wrong!
    move $ary[$i];                      # probably wrong!

Those actually parse as the very surprising:

    $obj->move->{FIELD};                # Well, lookee here
    $ary->move([$i]);                   # Didn't expect this one, eh?

Rather than what you might have expected:

    $obj->{FIELD}->move();              # You should be so lucky.
    $ary[$i]->move;                     # Yeah, sure.

The left side of ``->'' is not so limited, because it's an infix operator,
not a postfix operator.  

As if that weren't bad enough, think about this: Perl must guess I<at
compile time> whether C<name> and C<move> above are functions or methods.
Usually Perl gets it right, but when it doesn't it, you get a function
call compiled as a method, or vice versa.  This can introduce subtle
bugs that are hard to unravel.  For example, calling a method C<new>
in indirect notation--as C++ programmers are so wont to do--can
be miscompiled into a subroutine call if there's already a C<new>
function in scope.  You'd end up calling the current package's C<new>
as a subroutine, rather than the desired class's method.  The compiler
tries to cheat by remembering bareword C<require>s, but the grief if it
messes up just isn't worth the years of debugging it would likely take
you to track such subtle bugs down.

The infix arrow notation using ``C<< -> >>'' doesn't suffer from either
of these disturbing ambiguities, so we recommend you use it exclusively.

=head2 Default UNIVERSAL methods

The C<UNIVERSAL> package automatically contains the following methods that
are inherited by all other classes:

=over 4

=item isa(CLASS)

C<isa> returns I<true> if its object is blessed into a subclass of C<CLASS>

C<isa> is also exportable and can be called as a sub with two arguments. This
allows the ability to check what a reference points to. Example

    use UNIVERSAL qw(isa);

    if(isa($ref, 'ARRAY')) {
    	#...
    }

=item can(METHOD)

C<can> checks to see if its object has a method called C<METHOD>,
if it does then a reference to the sub is returned, if it does not then
I<undef> is returned.

=item VERSION( [NEED] )

C<VERSION> returns the version number of the class (package).  If the
NEED argument is given then it will check that the current version (as
defined by the $VERSION variable in the given package) not less than
NEED; it will die if this is not the case.  This method is normally
called as a class method.  This method is called automatically by the
C<VERSION> form of C<use>.

    use A 1.2 qw(some imported subs);
    # implies:
    A->VERSION(1.2);

=back

B<NOTE:> C<can> directly uses Perl's internal code for method lookup, and
C<isa> uses a very similar method and cache-ing strategy. This may cause
strange effects if the Perl code dynamically changes @ISA in any package.

You may add other methods to the UNIVERSAL class via Perl or XS code.
You do not need to C<use UNIVERSAL> to make these methods
available to your program.  This is necessary only if you wish to
have C<isa> available as a plain subroutine in the current package.

=head2 Destructors

When the last reference to an object goes away, the object is
automatically destroyed.  (This may even be after you exit, if you've
stored references in global variables.)  If you want to capture control
just before the object is freed, you may define a DESTROY method in
your class.  It will automatically be called at the appropriate moment,
and you can do any extra cleanup you need to do.  Perl passes a reference
to the object under destruction as the first (and only) argument.  Beware
that the reference is a read-only value, and cannot be modified by
manipulating C<$_[0]> within the destructor.  The object itself (i.e.
the thingy the reference points to, namely C<${$_[0]}>, C<@{$_[0]}>, 
C<%{$_[0]}> etc.) is not similarly constrained.

If you arrange to re-bless the reference before the destructor returns,
perl will again call the DESTROY method for the re-blessed object after
the current one returns.  This can be used for clean delegation of
object destruction, or for ensuring that destructors in the base classes
of your choosing get called.  Explicitly calling DESTROY is also possible,
but is usually never needed.

Do not confuse the previous discussion with how objects I<CONTAINED> in the current
one are destroyed.  Such objects will be freed and destroyed automatically
when the current object is freed, provided no other references to them exist
elsewhere.

=head2 Summary

That's about all there is to it.  Now you need just to go off and buy a
book about object-oriented design methodology, and bang your forehead
with it for the next six months or so.

=head2 Two-Phased Garbage Collection

For most purposes, Perl uses a fast and simple, reference-based
garbage collection system.  That means there's an extra
dereference going on at some level, so if you haven't built
your Perl executable using your C compiler's C<-O> flag, performance
will suffer.  If you I<have> built Perl with C<cc -O>, then this
probably won't matter.

A more serious concern is that unreachable memory with a non-zero
reference count will not normally get freed.  Therefore, this is a bad
idea:

    {
	my $a;
	$a = \$a;
    }

Even thought $a I<should> go away, it can't.  When building recursive data
structures, you'll have to break the self-reference yourself explicitly
if you don't care to leak.  For example, here's a self-referential
node such as one might use in a sophisticated tree structure:

    sub new_node {
	my $self = shift;
	my $class = ref($self) || $self;
	my $node = {};
	$node->{LEFT} = $node->{RIGHT} = $node;
	$node->{DATA} = [ @_ ];
	return bless $node => $class;
    }

If you create nodes like that, they (currently) won't go away unless you
break their self reference yourself.  (In other words, this is not to be
construed as a feature, and you shouldn't depend on it.)

Almost.

When an interpreter thread finally shuts down (usually when your program
exits), then a rather costly but complete mark-and-sweep style of garbage
collection is performed, and everything allocated by that thread gets
destroyed.  This is essential to support Perl as an embedded or a
multithreadable language.  For example, this program demonstrates Perl's
two-phased garbage collection:

    #!/usr/bin/perl
    package Subtle;

    sub new {
	my $test;
	$test = \$test;
	warn "CREATING " . \$test;
	return bless \$test;
    }

    sub DESTROY {
	my $self = shift;
	warn "DESTROYING $self";
    }

    package main;

    warn "starting program";
    {
	my $a = Subtle->new;
	my $b = Subtle->new;
	$$a = 0;  # break selfref
	warn "leaving block";
    }

    warn "just exited block";
    warn "time to die...";
    exit;

When run as F</tmp/test>, the following output is produced:

    starting program at /tmp/test line 18.
    CREATING SCALAR(0x8e5b8) at /tmp/test line 7.
    CREATING SCALAR(0x8e57c) at /tmp/test line 7.
    leaving block at /tmp/test line 23.
    DESTROYING Subtle=SCALAR(0x8e5b8) at /tmp/test line 13.
    just exited block at /tmp/test line 26.
    time to die... at /tmp/test line 27.
    DESTROYING Subtle=SCALAR(0x8e57c) during global destruction.

Notice that "global destruction" bit there?  That's the thread
garbage collector reaching the unreachable.

Objects are always destructed, even when regular refs aren't.  Objects
are destructed in a separate pass before ordinary refs just to 
prevent object destructors from using refs that have been themselves
destructed.  Plain refs are only garbage-collected if the destruct level
is greater than 0.  You can test the higher levels of global destruction
by setting the PERL_DESTRUCT_LEVEL environment variable, presuming
C<-DDEBUGGING> was enabled during perl build time.

A more complete garbage collection strategy will be implemented
at a future date.

In the meantime, the best solution is to create a non-recursive container
class that holds a pointer to the self-referential data structure.
Define a DESTROY method for the containing object's class that manually
breaks the circularities in the self-referential structure.

=head1 SEE ALSO

A kinder, gentler tutorial on object-oriented programming in Perl can
be found in L<perltoot>, L<perlbootc> and L<perltootc>.  You should
also check out L<perlbot> for other object tricks, traps, and tips, as
well as L<perlmodlib> for some style guides on constructing both
modules and classes.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               =head1 NAME

perlop - Perl operators and precedence

=head1 SYNOPSIS

Perl operators have the following associativity and precedence,
listed from highest precedence to lowest.  Operators borrowed from
C keep the same precedence relationship with each other, even where
C's precedence is slightly screwy.  (This makes learning Perl easier
for C folks.)  With very few exceptions, these all operate on scalar
values only, not array values.

    left	terms and list operators (leftward)
    left	->
    nonassoc	++ --
    right	**
    right	! ~ \ and unary + and -
    left	=~ !~
    left	* / % x
    left	+ - .
    left	<< >>
    nonassoc	named unary operators
    nonassoc	< > <= >= lt gt le ge
    nonassoc	== != <=> eq ne cmp
    left	&
    left	| ^
    left	&&
    left	||
    nonassoc	..  ...
    right	?:
    right	= += -= *= etc.
    left	, =>
    nonassoc	list operators (rightward)
    right	not
    left	and
    left	or xor

In the following sections, these operators are covered in precedence order.

Many operators can be overloaded for objects.  See L<overload>.

=head1 DESCRIPTION

=head2 Terms and List Operators (Leftward)

A TERM has the highest precedence in Perl.  They include variables,
quote and quote-like operators, any expression in parentheses,
and any function whose arguments are parenthesized.  Actually, there
aren't really functions in this sense, just list operators and unary
operators behaving as functions because you put parentheses around
the arguments.  These are all documented in L<perlfunc>.

If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
is followed by a left parenthesis as the next token, the operator and
arguments within parentheses are taken to be of highest precedence,
just like a normal function call.

In the absence of parentheses, the precedence of list operators such as
C<print>, C<sort>, or C<chmod> is either very high or very low depending on
whether you are looking at the left side or the right side of the operator.
For example, in

    @ary = (1, 3, sort 4, 2);
    print @ary;		# prints 1324

the commas on the right of the sort are evaluated before the sort,
but the commas on the left are evaluated after.  In other words,
list operators tend to gobble up all arguments that follow, and
then act like a simple TERM with regard to the preceding expression.
Be careful with parentheses:

    # These evaluate exit before doing the print:
    print($foo, exit);	# Obviously not what you want.
    print $foo, exit;	# Nor is this.

    # These do the print before evaluating exit:
    (print $foo), exit;	# This is what you want.
    print($foo), exit;	# Or this.
    print ($foo), exit;	# Or even this.

Also note that

    print ($foo & 255) + 1, "\n";

probably doesn't do what you expect at first glance.  See
L<Named Unary Operators> for more discussion of this.

Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
well as subroutine and method calls, and the anonymous
constructors C<[]> and C<{}>.

See also L<Quote and Quote-like Operators> toward the end of this section,
as well as L<"I/O Operators">.

=head2 The Arrow Operator

"C<< -> >>" is an infix dereference operator, just as it is in C
and C++.  If the right side is either a C<[...]>, C<{...}>, or a
C<(...)> subscript, then the left side must be either a hard or
symbolic reference to an array, a hash, or a subroutine respectively.
(Or technically speaking, a location capable of holding a hard
reference, if it's an array or hash reference being used for
assignment.)  See L<perlreftut> and L<perlref>.

Otherwise, the right side is a method name or a simple scalar
variable containing either the method name or a subroutine reference,
and the left side must be either an object (a blessed reference)
or a class name (that is, a package name).  See L<perlobj>.

=head2 Auto-increment and Auto-decrement

"++" and "--" work as in C.  That is, if placed before a variable, they
increment or decrement the variable before returning the value, and if
placed after, increment or decrement the variable after returning the value.

The auto-increment operator has a little extra builtin magic to it.  If
you increment a variable that is numeric, or that has ever been used in
a numeric context, you get a normal increment.  If, however, the
variable has been used in only string contexts since it was set, and
has a value that is not the empty string and matches the pattern
C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
character within its range, with carry:

    print ++($foo = '99');	# prints '100'
    print ++($foo = 'a0');	# prints 'a1'
    print ++($foo = 'Az');	# prints 'Ba'
    print ++($foo = 'zz');	# prints 'aaa'

The auto-decrement operator is not magical.

=head2 Exponentiation

Binary "**" is the exponentiation operator.  It binds even more
tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
implemented using C's pow(3) function, which actually works on doubles
internally.)

=head2 Symbolic Unary Operators

Unary "!" performs logical negation, i.e., "not".  See also C<not> for a lower
precedence version of this.

Unary "-" performs arithmetic negation if the operand is numeric.  If
the operand is an identifier, a string consisting of a minus sign
concatenated with the identifier is returned.  Otherwise, if the string
starts with a plus or minus, a string starting with the opposite sign
is returned.  One effect of these rules is that C<-bareword> is equivalent
to C<"-bareword">.

Unary "~" performs bitwise negation, i.e., 1's complement.  For
example, C<0666 & ~027> is 0640.  (See also L<Integer Arithmetic> and
L<Bitwise String Operators>.)  Note that the width of the result is
platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
bits wide on a 64-bit platform, so if you are expecting a certain bit
width, remember use the & operator to mask off the excess bits.

Unary "+" has no effect whatsoever, even on strings.  It is useful
syntactically for separating a function name from a parenthesized expression
that would otherwise be interpreted as the complete list of function
arguments.  (See examples above under L<Terms and List Operators (Leftward)>.)

Unary "\" creates a reference to whatever follows it.  See L<perlreftut>
and L<perlref>.  Do not confuse this behavior with the behavior of
backslash within a string, although both forms do convey the notion
of protecting the next thing from interpolation.

=head2 Binding Operators

Binary "=~" binds a scalar expression to a pattern match.  Certain operations
search or modify the string $_ by default.  This operator makes that kind
of operation work on some other string.  The right argument is a search
pattern, substitution, or transliteration.  The left argument is what is
supposed to be searched, substituted, or transliterated instead of the default
$_.  When used in scalar context, the return value generally indicates the
success of the operation.  Behavior in list context depends on the particular
operator.  See L</"Regexp Quote-Like Operators"> for details.

If the right argument is an expression rather than a search pattern,
substitution, or transliteration, it is interpreted as a search pattern at run
time.  This can be less efficient than an explicit search, because the
pattern must be compiled every time the expression is evaluated.

Binary "!~" is just like "=~" except the return value is negated in
the logical sense.

=head2 Multiplicative Operators

Binary "*" multiplies two numbers.

Binary "/" divides two numbers.

Binary "%" computes the modulus of two numbers.  Given integer
operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
C<$a> minus the largest multiple of C<$b> that is not greater than
C<$a>.  If C<$b> is negative, then C<$a % $b> is C<$a> minus the
smallest multiple of C<$b> that is not less than C<$a> (i.e. the
result will be less than or equal to zero). 
Note than when C<use integer> is in scope, "%" gives you direct access
to the modulus operator as implemented by your C compiler.  This
operator is not as well defined for negative operands, but it will
execute faster.

Binary "x" is the repetition operator.  In scalar context or if the left
operand is not enclosed in parentheses, it returns a string consisting
of the left operand repeated the number of times specified by the right
operand.  In list context, if the left operand is enclosed in
parentheses, it repeats the list.

    print '-' x 80;		# print row of dashes

    print "\t" x ($tab/8), ' ' x ($tab%8);	# tab over

    @ones = (1) x 80;		# a list of 80 1's
    @ones = (5) x @ones;	# set all elements to 5


=head2 Additive Operators

Binary "+" returns the sum of two numbers.

Binary "-" returns the difference of two numbers.

Binary "." concatenates two strings.

=head2 Shift Operators

Binary "<<" returns the value of its left argument shifted left by the
number of bits specified by the right argument.  Arguments should be
integers.  (See also L<Integer Arithmetic>.)

Binary ">>" returns the value of its left argument shifted right by
the number of bits specified by the right argument.  Arguments should
be integers.  (See also L<Integer Arithmetic>.)

=head2 Named Unary Operators

The various named unary operators are treated as functions with one
argument, with optional parentheses.  These include the filetest
operators, like C<-f>, C<-M>, etc.  See L<perlfunc>.

If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
is followed by a left parenthesis as the next token, the operator and
arguments within parentheses are taken to be of highest precedence,
just like a normal function call.  For example,
because named unary operators are higher precedence than ||:

    chdir $foo    || die;	# (chdir $foo) || die
    chdir($foo)   || die;	# (chdir $foo) || die
    chdir ($foo)  || die;	# (chdir $foo) || die
    chdir +($foo) || die;	# (chdir $foo) || die

but, because * is higher precedence than named operators:

    chdir $foo * 20;	# chdir ($foo * 20)
    chdir($foo) * 20;	# (chdir $foo) * 20
    chdir ($foo) * 20;	# (chdir $foo) * 20
    chdir +($foo) * 20;	# chdir ($foo * 20)

    rand 10 * 20;	# rand (10 * 20)
    rand(10) * 20;	# (rand 10) * 20
    rand (10) * 20;	# (rand 10) * 20
    rand +(10) * 20;	# rand (10 * 20)

See also L<"Terms and List Operators (Leftward)">.

=head2 Relational Operators

Binary "<" returns true if the left argument is numerically less than
the right argument.

Binary ">" returns true if the left argument is numerically greater
than the right argument.

Binary "<=" returns true if the left argument is numerically less than
or equal to the right argument.

Binary ">=" returns true if the left argument is numerically greater
than or equal to the right argument.

Binary "lt" returns true if the left argument is stringwise less than
the right argument.

Binary "gt" returns true if the left argument is stringwise greater
than the right argument.

Binary "le" returns true if the left argument is stringwise less than
or equal to the right argument.

Binary "ge" returns true if the left argument is stringwise greater
than or equal to the right argument.

=head2 Equality Operators

Binary "==" returns true if the left argument is numerically equal to
the right argument.

Binary "!=" returns true if the left argument is numerically not equal
to the right argument.

Binary "<=>" returns -1, 0, or 1 depending on whether the left
argument is numerically less than, equal to, or greater than the right
argument.  If your platform supports NaNs (not-a-numbers) as numeric
values, using them with "<=>" returns undef.  NaN is not "<", "==", ">",
"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
returns true, as does NaN != anything else. If your platform doesn't
support NaNs then NaN is just a string with numeric value 0.

    perl -le '$a = NaN; print "No NaN support here" if $a == $a'
    perl -le '$a = NaN; print "NaN support here" if $a != $a'

Binary "eq" returns true if the left argument is stringwise equal to
the right argument.

Binary "ne" returns true if the left argument is stringwise not equal
to the right argument.

Binary "cmp" returns -1, 0, or 1 depending on whether the left
argument is stringwise less than, equal to, or greater than the right
argument.

"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
by the current locale if C<use locale> is in effect.  See L<perllocale>.

=head2 Bitwise And

Binary "&" returns its operators ANDed together bit by bit.
(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)

=head2 Bitwise Or and Exclusive Or

Binary "|" returns its operators ORed together bit by bit.
(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)

Binary "^" returns its operators XORed together bit by bit.
(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)

=head2 C-style Logical And

Binary "&&" performs a short-circuit logical AND operation.  That is,
if the left operand is false, the right operand is not even evaluated.
Scalar or list context propagates down to the right operand if it
is evaluated.

=head2 C-style Logical Or

Binary "||" performs a short-circuit logical OR operation.  That is,
if the left operand is true, the right operand is not even evaluated.
Scalar or list context propagates down to the right operand if it
is evaluated.

The C<||> and C<&&> operators differ from C's in that, rather than returning
0 or 1, they return the last value evaluated.  Thus, a reasonably portable
way to find out the home directory (assuming it's not "0") might be:

    $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
	(getpwuid($<))[7] || die "You're homeless!\n";

In particular, this means that you shouldn't use this
for selecting between two aggregates for assignment:

    @a = @b || @c;		# this is wrong
    @a = scalar(@b) || @c;	# really meant this
    @a = @b ? @b : @c;		# this works fine, though

As more readable alternatives to C<&&> and C<||> when used for
control flow, Perl provides C<and> and C<or> operators (see below).
The short-circuit behavior is identical.  The precedence of "and" and
"or" is much lower, however, so that you can safely use them after a
list operator without the need for parentheses:

    unlink "alpha", "beta", "gamma"
	    or gripe(), next LINE;

With the C-style operators that would have been written like this:

    unlink("alpha", "beta", "gamma")
	    || (gripe(), next LINE);

Using "or" for assignment is unlikely to do what you want; see below.

=head2 Range Operators

Binary ".." is the range operator, which is really two different
operators depending on the context.  In list context, it returns an
array of values counting (up by ones) from the left value to the right
value.  If the left value is greater than the right value then it
returns the empty array.  The range operator is useful for writing
C<foreach (1..10)> loops and for doing slice operations on arrays.  In
the current implementation, no temporary array is created when the
range operator is used as the expression in C<foreach> loops, but older
versions of Perl might burn a lot of memory when you write something
like this:

    for (1 .. 1_000_000) {
	# code
    }

In scalar context, ".." returns a boolean value.  The operator is
bistable, like a flip-flop, and emulates the line-range (comma) operator
of B<sed>, B<awk>, and various editors.  Each ".." operator maintains its
own boolean state.  It is false as long as its left operand is false.
Once the left operand is true, the range operator stays true until the
right operand is true, I<AFTER> which the range operator becomes false
again.  It doesn't become false till the next time the range operator is
evaluated.  It can test the right operand and become false on the same
evaluation it became true (as in B<awk>), but it still returns true once.
If you don't want it to test the right operand till the next
evaluation, as in B<sed>, just use three dots ("...") instead of
two.  In all other regards, "..." behaves just like ".." does.

The right operand is not evaluated while the operator is in the
"false" state, and the left operand is not evaluated while the
operator is in the "true" state.  The precedence is a little lower
than || and &&.  The value returned is either the empty string for
false, or a sequence number (beginning with 1) for true.  The
sequence number is reset for each range encountered.  The final
sequence number in a range has the string "E0" appended to it, which
doesn't affect its numeric value, but gives you something to search
for if you want to exclude the endpoint.  You can exclude the
beginning point by waiting for the sequence number to be greater
than 1.  If either operand of scalar ".." is a constant expression,
that operand is implicitly compared to the C<$.> variable, the
current line number.  Examples:

As a scalar operator:

    if (101 .. 200) { print; }	# print 2nd hundred lines
    next line if (1 .. /^$/);	# skip header lines
    s/^/> / if (/^$/ .. eof());	# quote body

    # parse mail messages
    while (<>) {
        $in_header =   1  .. /^$/;
        $in_body   = /^$/ .. eof();
	# do something based on those
    } continue {
	close ARGV if eof; 		# reset $. each file
    }

As a list operator:

    for (101 .. 200) { print; }	# print $_ 100 times
    @foo = @foo[0 .. $#foo];	# an expensive no-op
    @foo = @foo[$#foo-4 .. $#foo];	# slice last 5 items

The range operator (in list context) makes use of the magical
auto-increment algorithm if the operands are strings.  You
can say

    @alphabet = ('A' .. 'Z');

to get all normal letters of the alphabet, or

    $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];

to get a hexadecimal digit, or

    @z2 = ('01' .. '31');  print $z2[$mday];

to get dates with leading zeros.  If the final value specified is not
in the sequence that the magical increment would produce, the sequence
goes until the next value would be longer than the final value
specified.

=head2 Conditional Operator

Ternary "?:" is the conditional operator, just as in C.  It works much
like an if-then-else.  If the argument before the ? is true, the
argument before the : is returned, otherwise the argument after the :
is returned.  For example:

    printf "I have %d dog%s.\n", $n,
	    ($n == 1) ? '' : "s";

Scalar or list context propagates downward into the 2nd
or 3rd argument, whichever is selected.

    $a = $ok ? $b : $c;  # get a scalar
    @a = $ok ? @b : @c;  # get an array
    $a = $ok ? @b : @c;  # oops, that's just a count!

The operator may be assigned to if both the 2nd and 3rd arguments are
legal lvalues (meaning that you can assign to them):

    ($a_or_b ? $a : $b) = $c;

Because this operator produces an assignable result, using assignments
without parentheses will get you in trouble.  For example, this:

    $a % 2 ? $a += 10 : $a += 2

Really means this:

    (($a % 2) ? ($a += 10) : $a) += 2

Rather than this:

    ($a % 2) ? ($a += 10) : ($a += 2)

That should probably be written more simply as:

    $a += ($a % 2) ? 10 : 2;

=head2 Assignment Operators

"=" is the ordinary assignment operator.

Assignment operators work as in C.  That is,

    $a += 2;

is equivalent to

    $a = $a + 2;

although without duplicating any side effects that dereferencing the lvalue
might trigger, such as from tie().  Other assignment operators work similarly.
The following are recognized:

    **=    +=    *=    &=    <<=    &&=
           -=    /=    |=    >>=    ||=
           .=    %=    ^=
	         x=

Although these are grouped by family, they all have the precedence
of assignment.

Unlike in C, the scalar assignment operator produces a valid lvalue.
Modifying an assignment is equivalent to doing the assignment and
then modifying the variable that was assigned to.  This is useful
for modifying a copy of something, like this:

    ($tmp = $global) =~ tr [A-Z] [a-z];

Likewise,

    ($a += 2) *= 3;

is equivalent to

    $a += 2;
    $a *= 3;

Similarly, a list assignment in list context produces the list of
lvalues assigned to, and a list assignment in scalar context returns
the number of elements produced by the expression on the right hand
side of the assignment.

=head2 Comma Operator

Binary "," is the comma operator.  In scalar context it evaluates
its left argument, throws that value away, then evaluates its right
argument and returns that value.  This is just like C's comma operator.

In list context, it's just the list argument separator, and inserts
both its arguments into the list.

The => digraph is mostly just a synonym for the comma operator.  It's useful for
documenting arguments that come in pairs.  As of release 5.001, it also forces
any word to the left of it to be interpreted as a string.

=head2 List Operators (Rightward)

On the right side of a list operator, it has very low precedence,
such that it controls all comma-separated expressions found there.
The only operators with lower precedence are the logical operators
"and", "or", and "not", which may be used to evaluate calls to list
operators without the need for extra parentheses:

    open HANDLE, "filename"
	or die "Can't open: $!\n";

See also discussion of list operators in L<Terms and List Operators (Leftward)>.

=head2 Logical Not

Unary "not" returns the logical negation of the expression to its right.
It's the equivalent of "!" except for the very low precedence.

=head2 Logical And

Binary "and" returns the logical conjunction of the two surrounding
expressions.  It's equivalent to && except for the very low
precedence.  This means that it short-circuits: i.e., the right
expression is evaluated only if the left expression is true.

=head2 Logical or and Exclusive Or

Binary "or" returns the logical disjunction of the two surrounding
expressions.  It's equivalent to || except for the very low precedence.
This makes it useful for control flow

    print FH $data		or die "Can't write to FH: $!";

This means that it short-circuits: i.e., the right expression is evaluated
only if the left expression is false.  Due to its precedence, you should
probably avoid using this for assignment, only for control flow.

    $a = $b or $c;		# bug: this is wrong
    ($a = $b) or $c;		# really means this
    $a = $b || $c;		# better written this way

However, when it's a list-context assignment and you're trying to use
"||" for control flow, you probably need "or" so that the assignment
takes higher precedence.

    @info = stat($file) || die;     # oops, scalar sense of stat!
    @info = stat($file) or die;     # better, now @info gets its due

Then again, you could always use parentheses. 

Binary "xor" returns the exclusive-OR of the two surrounding expressions.
It cannot short circuit, of course.

=head2 C Operators Missing From Perl

Here is what C has that Perl doesn't:

=over 8

=item unary &

Address-of operator.  (But see the "\" operator for taking a reference.)

=item unary *

Dereference-address operator. (Perl's prefix dereferencing
operators are typed: $, @, %, and &.)

=item (TYPE)

Type-casting operator.

=back

=head2 Quote and Quote-like Operators

While we usually think of quotes as literal values, in Perl they
function as operators, providing various kinds of interpolating and
pattern matching capabilities.  Perl provides customary quote characters
for these behaviors, but also provides a way for you to choose your
quote character for any of them.  In the following table, a C<{}> represents
any pair of delimiters you choose.  

    Customary  Generic        Meaning	     Interpolates
	''	 q{}	      Literal		  no
	""	qq{}	      Literal		  yes
	``	qx{}	      Command		  yes (unless '' is delimiter)
		qw{}	     Word list		  no
	//	 m{}	   Pattern match	  yes (unless '' is delimiter)
		qr{}	      Pattern		  yes (unless '' is delimiter)
		 s{}{}	    Substitution	  yes (unless '' is delimiter)
		tr{}{}	  Transliteration	  no (but see below)

Non-bracketing delimiters use the same character fore and aft, but the four
sorts of brackets (round, angle, square, curly) will all nest, which means
that 

	q{foo{bar}baz} 

is the same as 

	'foo{bar}baz'

Note, however, that this does not always work for quoting Perl code:

	$s = q{ if($a eq "}") ... }; # WRONG

is a syntax error. The C<Text::Balanced> module on CPAN is able to do this
properly.

There can be whitespace between the operator and the quoting
characters, except when C<#> is being used as the quoting character.
C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
operator C<q> followed by a comment.  Its argument will be taken
from the next line.  This allows you to write:

    s {foo}  # Replace foo
      {bar}  # with bar.

For constructs that do interpolate, variables beginning with "C<$>"
or "C<@>" are interpolated, as are the following escape sequences.  Within
a transliteration, the first eleven of these sequences may be used.

    \t		tab             (HT, TAB)
    \n		newline         (NL)
    \r		return          (CR)
    \f		form feed       (FF)
    \b		backspace       (BS)
    \a		alarm (bell)    (BEL)
    \e		escape          (ESC)
    \033	octal char	(ESC)
    \x1b	hex char	(ESC)
    \x{263a}	wide hex char	(SMILEY)
    \c[		control char    (ESC)
    \N{name}	named char

    \l		lowercase next char
    \u		uppercase next char
    \L		lowercase till \E
    \U		uppercase till \E
    \E		end case modification
    \Q		quote non-word characters till \E

If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
and C<\U> is taken from the current locale.  See L<perllocale>.  For
documentation of C<\N{name}>, see L<charnames>.

All systems use the virtual C<"\n"> to represent a line terminator,
called a "newline".  There is no such thing as an unvarying, physical
newline character.  It is only an illusion that the operating system,
device drivers, C libraries, and Perl all conspire to preserve.  Not all
systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF.  For example,
on a Mac, these are reversed, and on systems without line terminator,
printing C<"\n"> may emit no actual data.  In general, use C<"\n"> when
you mean a "newline" for your system, but use the literal ASCII when you
need an exact character.  For example, most networking protocols expect
and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
and although they often accept just C<"\012">, they seldom tolerate just
C<"\015">.  If you get in the habit of using C<"\n"> for networking,
you may be burned some day.

You cannot include a literal C<$> or C<@> within a C<\Q> sequence. 
An unescaped C<$> or C<@> interpolates the corresponding variable, 
while escaping will cause the literal string C<\$> to be inserted.
You'll need to write something like C<m/\Quser\E\@\Qhost/>. 

Patterns are subject to an additional level of interpretation as a
regular expression.  This is done as a second pass, after variables are
interpolated, so that regular expressions may be incorporated into the
pattern from the variables.  If this is not what you want, use C<\Q> to
interpolate a variable literally.

Apart from the behavior described above, Perl does not expand
multiple levels of interpolation.  In particular, contrary to the
expectations of shell programmers, back-quotes do I<NOT> interpolate
within double quotes, nor do single quotes impede evaluation of
variables when used within double quotes.

=head2 Regexp Quote-Like Operators

Here are the quote-like operators that apply to pattern
matching and related activities.

=over 8

=item ?PATTERN?

This is just like the C</pattern/> search, except that it matches only
once between calls to the reset() operator.  This is a useful
optimization when you want to see only the first occurrence of
something in each file of a set of files, for instance.  Only C<??>
patterns local to the current package are reset.

    while (<>) {
	if (?^$?) {
			    # blank line between header and body
	}
    } continue {
	reset if eof;	    # clear ?? status for next file
    }

This usage is vaguely deprecated, which means it just might possibly
be removed in some distant future version of Perl, perhaps somewhere
around the year 2168.

=item m/PATTERN/cgimosx

=item /PATTERN/cgimosx

Searches a string for a pattern match, and in scalar context returns
true if it succeeds, false if it fails.  If no string is specified
via the C<=~> or C<!~> operator, the $_ string is searched.  (The
string specified with C<=~> need not be an lvalue--it may be the
result of an expression evaluation, but remember the C<=~> binds
rather tightly.)  See also L<perlre>.  See L<perllocale> for
discussion of additional considerations that apply when C<use locale>
is in effect.

Options are:

    c	Do not reset search position on a failed match when /g is in effect.
    g	Match globally, i.e., find all occurrences.
    i	Do case-insensitive pattern matching.
    m	Treat string as multiple lines.
    o	Compile pattern only once.
    s	Treat string as single line.
    x	Use extended regular expressions.

If "/" is the delimiter then the initial C<m> is optional.  With the C<m>
you can use any pair of non-alphanumeric, non-whitespace characters 
as delimiters.  This is particularly useful for matching path names
that contain "/", to avoid LTS (leaning toothpick syndrome).  If "?" is
the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
If "'" is the delimiter, no interpolation is performed on the PATTERN.

PATTERN may contain variables, which will be interpolated (and the
pattern recompiled) every time the pattern search is evaluated, except
for when the delimiter is a single quote.  (Note that C<$(>, C<$)>, and
C<$|> are not interpolated because they look like end-of-string tests.)
If you want such a pattern to be compiled only once, add a C</o> after
the trailing delimiter.  This avoids expensive run-time recompilations,
and is useful when the value you are interpolating won't change over
the life of the script.  However, mentioning C</o> constitutes a promise
that you won't change the variables in the pattern.  If you change them,
Perl won't even notice.  See also L<"qr/STRING/imosx">.

If the PATTERN evaluates to the empty string, the last
I<successfully> matched regular expression is used instead.

If the C</g> option is not used, C<m//> in list context returns a
list consisting of the subexpressions matched by the parentheses in the
pattern, i.e., (C<$1>, C<$2>, C<$3>...).  (Note that here C<$1> etc. are
also set, and that this differs from Perl 4's behavior.)  When there are
no parentheses in the pattern, the return value is the list C<(1)> for
success.  With or without parentheses, an empty list is returned upon
failure.

Examples:

    open(TTY, '/dev/tty');
    <TTY> =~ /^y/i && foo();	# do foo if desired

    if (/Version: *([0-9.]*)/) { $version = $1; }

    next if m#^/usr/spool/uucp#;

    # poor man's grep
    $arg = shift;
    while (<>) {
	print if /$arg/o;	# compile only once
    }

    if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))

This last example splits $foo into the first two words and the
remainder of the line, and assigns those three fields to $F1, $F2, and
$Etc.  The conditional is true if any variables were assigned, i.e., if
the pattern matched.

The C</g> modifier specifies global pattern matching--that is,
matching as many times as possible within the string.  How it behaves
depends on the context.  In list context, it returns a list of the
substrings matched by any capturing parentheses in the regular
expression.  If there are no parentheses, it returns a list of all
the matched strings, as if there were parentheses around the whole
pattern.

In scalar context, each execution of C<m//g> finds the next match,
returning true if it matches, and false if there is no further match.
The position after the last match can be read or set using the pos()
function; see L<perlfunc/pos>.   A failed match normally resets the
search position to the beginning of the string, but you can avoid that
by adding the C</c> modifier (e.g. C<m//gc>).  Modifying the target
string also resets the search position.

You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
zero-width assertion that matches the exact position where the previous
C<m//g>, if any, left off.  Without the C</g> modifier, the C<\G> assertion
still anchors at pos(), but the match is of course only attempted once.
Using C<\G> without C</g> on a target string that has not previously had a
C</g> match applied to it is the same as using the C<\A> assertion to match
the beginning of the string.

Examples:

    # list context
    ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);

    # scalar context
    $/ = "";
    while (defined($paragraph = <>)) {
	while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
	    $sentences++;
	}
    }
    print "$sentences\n";

    # using m//gc with \G
    $_ = "ppooqppqq";
    while ($i++ < 2) {
        print "1: '";
        print $1 while /(o)/gc; print "', pos=", pos, "\n";
        print "2: '";
        print $1 if /\G(q)/gc;  print "', pos=", pos, "\n";
        print "3: '";
        print $1 while /(p)/gc; print "', pos=", pos, "\n";
    }
    print "Final: '$1', pos=",pos,"\n" if /\G(.)/;

The last example should print:

    1: 'oo', pos=4
    2: 'q', pos=5
    3: 'pp', pos=7
    1: '', pos=7
    2: 'q', pos=8
    3: '', pos=8
    Final: 'q', pos=8

Notice that the final match matched C<q> instead of C<p>, which a match
without the C<\G> anchor would have done. Also note that the final match
did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
final match did indeed match C<p>, it's a good bet that you're running an
older (pre-5.6.0) Perl.

A useful idiom for C<lex>-like scanners is C</\G.../gc>.  You can
combine several regexps like this to process a string part-by-part,
doing different actions depending on which regexp matched.  Each
regexp tries to match where the previous one leaves off.

 $_ = <<'EOL';
      $url = new URI::URL "http://www/";   die if $url eq "xXx";
 EOL
 LOOP:
    {
      print(" digits"),		redo LOOP if /\G\d+\b[,.;]?\s*/gc;
      print(" lowercase"),	redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
      print(" UPPERCASE"),	redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
      print(" Capitalized"),	redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
      print(" MiXeD"),		redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
      print(" alphanumeric"),	redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
      print(" line-noise"),	redo LOOP if /\G[^A-Za-z0-9]+/gc;
      print ". That's all!\n";
    }

Here is the output (split into several lines):

 line-noise lowercase line-noise lowercase UPPERCASE line-noise
 UPPERCASE line-noise lowercase line-noise lowercase line-noise
 lowercase lowercase line-noise lowercase lowercase line-noise
 MiXeD line-noise. That's all!

=item q/STRING/

=item C<'STRING'>

A single-quoted, literal string.  A backslash represents a backslash
unless followed by the delimiter or another backslash, in which case
the delimiter or backslash is interpolated.

    $foo = q!I said, "You said, 'She said it.'"!;
    $bar = q('This is it.');
    $baz = '\n';		# a two-character string

=item qq/STRING/

=item "STRING"

A double-quoted, interpolated string.

    $_ .= qq
     (*** The previous line contains the naughty word "$1".\n)
		if /\b(tcl|java|python)\b/i;      # :-)
    $baz = "\n";		# a one-character string

=item qr/STRING/imosx

This operator quotes (and possibly compiles) its I<STRING> as a regular
expression.  I<STRING> is interpolated the same way as I<PATTERN>
in C<m/PATTERN/>.  If "'" is used as the delimiter, no interpolation
is done.  Returns a Perl value which may be used instead of the
corresponding C</STRING/imosx> expression.

For example,

    $rex = qr/my.STRING/is;
    s/$rex/foo/;

is equivalent to

    s/my.STRING/foo/is;

The result may be used as a subpattern in a match:

    $re = qr/$pattern/;
    $string =~ /foo${re}bar/;	# can be interpolated in other patterns
    $string =~ $re;		# or used standalone
    $string =~ /$re/;		# or this way

Since Perl may compile the pattern at the moment of execution of qr()
operator, using qr() may have speed advantages in some situations,
notably if the result of qr() is used standalone:

    sub match {
	my $patterns = shift;
	my @compiled = map qr/$_/i, @$patterns;
	grep {
	    my $success = 0;
	    foreach my $pat (@compiled) {
		$success = 1, last if /$pat/;
	    }
	    $success;
	} @_;
    }

Precompilation of the pattern into an internal representation at
the moment of qr() avoids a need to recompile the pattern every
time a match C</$pat/> is attempted.  (Perl has many other internal
optimizations, but none would be triggered in the above example if
we did not use qr() operator.)

Options are:

    i	Do case-insensitive pattern matching.
    m	Treat string as multiple lines.
    o	Compile pattern only once.
    s	Treat string as single line.
    x	Use extended regular expressions.

See L<perlre> for additional information on valid syntax for STRING, and
for a detailed look at the semantics of regular expressions.

=item qx/STRING/

=item `STRING`

A string which is (possibly) interpolated and then executed as a
system command with C</bin/sh> or its equivalent.  Shell wildcards,
pipes, and redirections will be honored.  The collected standard
output of the command is returned; standard error is unaffected.  In
scalar context, it comes back as a single (potentially multi-line)
string, or undef if the command failed.  In list context, returns a
list of lines (however you've defined lines with $/ or
$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.

Because backticks do not affect standard error, use shell file descriptor
syntax (assuming the shell supports this) if you care to address this.
To capture a command's STDERR and STDOUT together:

    $output = `cmd 2>&1`;

To capture a command's STDOUT but discard its STDERR:

    $output = `cmd 2>/dev/null`;

To capture a command's STDERR but discard its STDOUT (ordering is
important here):

    $output = `cmd 2>&1 1>/dev/null`;

To exchange a command's STDOUT and STDERR in order to capture the STDERR
but leave its STDOUT to come out the old STDERR:

    $output = `cmd 3>&1 1>&2 2>&3 3>&-`;

To read both a command's STDOUT and its STDERR separately, it's easiest
and safest to redirect them separately to files, and then read from those
files when the program is done:

    system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");

Using single-quote as a delimiter protects the command from Perl's
double-quote interpolation, passing it on to the shell instead:

    $perl_info  = qx(ps $$);            # that's Perl's $$
    $shell_info = qx'ps $$';            # that's the new shell's $$

How that string gets evaluated is entirely subject to the command
interpreter on your system.  On most platforms, you will have to protect
shell metacharacters if you want them treated literally.  This is in
practice difficult to do, as it's unclear how to escape which characters.
See L<perlsec> for a clean and safe example of a manual fork() and exec()
to emulate backticks safely.

On some platforms (notably DOS-like ones), the shell may not be
capable of dealing with multiline commands, so putting newlines in
the string may not get you what you want.  You may be able to evaluate
multiple commands in a single line by separating them with the command
separator character, if your shell supports that (e.g. C<;> on many Unix
shells; C<&> on the Windows NT C<cmd> shell).

Beginning with v5.6.0, Perl will attempt to flush all files opened for
output before starting the child process, but this may not be supported
on some platforms (see L<perlport>).  To be safe, you may need to set
C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
C<IO::Handle> on any open handles.

Beware that some command shells may place restrictions on the length
of the command line.  You must ensure your strings don't exceed this
limit after any necessary interpolations.  See the platform-specific
release notes for more details about your particular environment.

Using this operator can lead to programs that are difficult to port,
because the shell commands called vary between systems, and may in
fact not be present at all.  As one example, the C<type> command under
the POSIX shell is very different from the C<type> command under DOS.
That doesn't mean you should go out of your way to avoid backticks
when they're the right way to get something done.  Perl was made to be
a glue language, and one of the things it glues together is commands.
Just understand what you're getting yourself into.

See L<"I/O Operators"> for more discussion.

=item qw/STRING/

Evaluates to a list of the words extracted out of STRING, using embedded
whitespace as the word delimiters.  It can be understood as being roughly
equivalent to:

    split(' ', q/STRING/);

the difference being that it generates a real list at compile time.  So
this expression:

    qw(foo bar baz)

is semantically equivalent to the list:

    'foo', 'bar', 'baz'

Some frequently seen examples:

    use POSIX qw( setlocale localeconv )
    @EXPORT = qw( foo bar baz );

A common mistake is to try to separate the words with comma or to
put comments into a multi-line C<qw>-string.  For this reason, the
C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) 
produces warnings if the STRING contains the "," or the "#" character.

=item s/PATTERN/REPLACEMENT/egimosx

Searches a string for a pattern, and if found, replaces that pattern
with the replacement text and returns the number of substitutions
made.  Otherwise it returns false (specifically, the empty string).

If no string is specified via the C<=~> or C<!~> operator, the C<$_>
variable is searched and modified.  (The string specified with C<=~> must
be scalar variable, an array element, a hash element, or an assignment
to one of those, i.e., an lvalue.)

If the delimiter chosen is a single quote, no interpolation is
done on either the PATTERN or the REPLACEMENT.  Otherwise, if the
PATTERN contains a $ that looks like a variable rather than an
end-of-string test, the variable will be interpolated into the pattern
at run-time.  If you want the pattern compiled only once the first time
the variable is interpolated, use the C</o> option.  If the pattern
evaluates to the empty string, the last successfully executed regular
expression is used instead.  See L<perlre> for further explanation on these.
See L<perllocale> for discussion of additional considerations that apply
when C<use locale> is in effect.

Options are:

    e	Evaluate the right side as an expression.
    g	Replace globally, i.e., all occurrences.
    i	Do case-insensitive pattern matching.
    m	Treat string as multiple lines.
    o	Compile pattern only once.
    s	Treat string as single line.
    x	Use extended regular expressions.

Any non-alphanumeric, non-whitespace delimiter may replace the
slashes.  If single quotes are used, no interpretation is done on the
replacement string (the C</e> modifier overrides this, however).  Unlike
Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
text is not evaluated as a command.  If the
PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
pair of quotes, which may or may not be bracketing quotes, e.g.,
C<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
replacement portion to be treated as a full-fledged Perl expression
and evaluated right then and there.  It is, however, syntax checked at
compile-time. A second C<e> modifier will cause the replacement portion
to be C<eval>ed before being run as a Perl expression.

Examples:

    s/\bgreen\b/mauve/g;		# don't change wintergreen

    $path =~ s|/usr/bin|/usr/local/bin|;

    s/Login: $foo/Login: $bar/; # run-time pattern

    ($foo = $bar) =~ s/this/that/;	# copy first, then change

    $count = ($paragraph =~ s/Mister\b/Mr./g);  # get change-count

    $_ = 'abc123xyz';
    s/\d+/$&*2/e;		# yields 'abc246xyz'
    s/\d+/sprintf("%5d",$&)/e;	# yields 'abc  246xyz'
    s/\w/$& x 2/eg;		# yields 'aabbcc  224466xxyyzz'

    s/%(.)/$percent{$1}/g;	# change percent escapes; no /e
    s/%(.)/$percent{$1} || $&/ge;	# expr now, so /e
    s/^=(\w+)/&pod($1)/ge;	# use function call

    # expand variables in $_, but dynamics only, using
    # symbolic dereferencing
    s/\$(\w+)/${$1}/g;

    # Add one to the value of any numbers in the string
    s/(\d+)/1 + $1/eg;

    # This will expand any embedded scalar variable
    # (including lexicals) in $_ : First $1 is interpolated
    # to the variable name, and then evaluated
    s/(\$\w+)/$1/eeg;

    # Delete (most) C comments.
    $program =~ s {
	/\*	# Match the opening delimiter.
	.*?	# Match a minimal number of characters.
	\*/	# Match the closing delimiter.
    } []gsx;

    s/^\s*(.*?)\s*$/$1/;	# trim white space in $_, expensively

    for ($variable) {		# trim white space in $variable, cheap
	s/^\s+//;
	s/\s+$//;
    }

    s/([^ ]*) *([^ ]*)/$2 $1/;	# reverse 1st two fields

Note the use of $ instead of \ in the last example.  Unlike
B<sed>, we use the \<I<digit>> form in only the left hand side.
Anywhere else it's $<I<digit>>.

Occasionally, you can't use just a C</g> to get all the changes
to occur that you might want.  Here are two common cases:

    # put commas in the right places in an integer
    1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;  

    # expand tabs to 8-column spacing
    1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;

=item tr/SEARCHLIST/REPLACEMENTLIST/cds

=item y/SEARCHLIST/REPLACEMENTLIST/cds

Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list.  It returns
the number of characters replaced or deleted.  If no string is
specified via the =~ or !~ operator, the $_ string is transliterated.  (The
string specified with =~ must be a scalar variable, an array element, a
hash element, or an assignment to one of those, i.e., an lvalue.)

A character range may be specified with a hyphen, so C<tr/A-J/0-9/> 
does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
For B<sed> devotees, C<y> is provided as a synonym for C<tr>.  If the
SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
its own pair of quotes, which may or may not be bracketing quotes,
e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.

Note that C<tr> does B<not> do regular expression character classes
such as C<\d> or C<[:lower:]>.  The <tr> operator is not equivalent to
the tr(1) utility.  If you want to map strings between lower/upper
cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
using the C<s> operator if you need regular expressions.

Note also that the whole range idea is rather unportable between
character sets--and even within character sets they may cause results
you probably didn't expect.  A sound principle is to use only ranges
that begin from and end at either alphabets of equal case (a-e, A-E),
or digits (0-4).  Anything else is unsafe.  If in doubt, spell out the
character sets in full.

Options:

    c	Complement the SEARCHLIST.
    d	Delete found but unreplaced characters.
    s	Squash duplicate replaced characters.

If the C</c> modifier is specified, the SEARCHLIST character set
is complemented.  If the C</d> modifier is specified, any characters
specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
(Note that this is slightly more flexible than the behavior of some
B<tr> programs, which delete anything they find in the SEARCHLIST,
period.) If the C</s> modifier is specified, sequences of characters
that were transliterated to the same character are squashed down
to a single instance of the character.

If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
exactly as specified.  Otherwise, if the REPLACEMENTLIST is shorter
than the SEARCHLIST, the final character is replicated till it is long
enough.  If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.

Examples:

    $ARGV[1] =~ tr/A-Z/a-z/;	# canonicalize to lower case

    $cnt = tr/*/*/;		# count the stars in $_

    $cnt = $sky =~ tr/*/*/;	# count the stars in $sky

    $cnt = tr/0-9//;		# count the digits in $_

    tr/a-zA-Z//s;		# bookkeeper -> bokeper

    ($HOST = $host) =~ tr/a-z/A-Z/;

    tr/a-zA-Z/ /cs;		# change non-alphas to single space

    tr [\200-\377]
       [\000-\177];		# delete 8th bit

If multiple transliterations are given for a character, only the
first one is used:

    tr/AAA/XYZ/

will transliterate any A to X.

Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation.  That means that if you want to use variables, you
must use an eval():

    eval "tr/$oldlist/$newlist/";
    die $@ if $@;

    eval "tr/$oldlist/$newlist/, 1" or die $@;

=back

=head2 Gory details of parsing quoted constructs

When presented with something that might have several different
interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
principle to pick the most probable interpretation.  This strategy
is so successful that Perl programmers often do not suspect the
ambivalence of what they write.  But from time to time, Perl's
notions differ substantially from what the author honestly meant.

This section hopes to clarify how Perl handles quoted constructs.
Although the most common reason to learn this is to unravel labyrinthine
regular expressions, because the initial steps of parsing are the
same for all quoting operators, they are all discussed together.

The most important Perl parsing rule is the first one discussed
below: when processing a quoted construct, Perl first finds the end
of that construct, then interprets its contents.  If you understand
this rule, you may skip the rest of this section on the first
reading.  The other rules are likely to contradict the user's
expectations much less frequently than this first one.

Some passes discussed below are performed concurrently, but because
their results are the same, we consider them individually.  For different
quoting constructs, Perl performs different numbers of passes, from
one to five, but these passes are always performed in the same order.

=over 4

=item Finding the end

The first pass is finding the end of the quoted construct, whether
it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
construct, a C</> that terminates a C<qq//> construct, a C<]> which
terminates C<qq[]> construct, or a C<< > >> which terminates a
fileglob started with C<< < >>.

When searching for single-character non-pairing delimiters, such
as C</>, combinations of C<\\> and C<\/> are skipped.  However,
when searching for single-character pairing delimiter like C<[>,
combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
C<[>, C<]> are skipped as well.  When searching for multicharacter
delimiters, nothing is skipped.

For constructs with three-part delimiters (C<s///>, C<y///>, and
C<tr///>), the search is repeated once more.

During this search no attention is paid to the semantics of the construct.
Thus:

    "$hash{"$foo/$bar"}"

or:

    m/ 
      bar	# NOT a comment, this slash / terminated m//!
     /x

do not form legal quoted expressions.   The quoted part ends on the
first C<"> and C</>, and the rest happens to be a syntax error.
Because the slash that terminated C<m//> was followed by a C<SPACE>,
the example above is not C<m//x>, but rather C<m//> with no C</x>
modifier.  So the embedded C<#> is interpreted as a literal C<#>.

=item Removal of backslashes before delimiters

During the second pass, text between the starting and ending
delimiters is copied to a safe location, and the C<\> is removed
from combinations consisting of C<\> and delimiter--or delimiters,
meaning both starting and ending delimiters will should these differ.
This removal does not happen for multi-character delimiters.
Note that the combination C<\\> is left intact, just as it was.

Starting from this step no information about the delimiters is
used in parsing.

=item Interpolation

The next step is interpolation in the text obtained, which is now
delimiter-independent.  There are four different cases.

=over 4

=item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>

No interpolation is performed.

=item C<''>, C<q//>

The only interpolation is removal of C<\> from pairs C<\\>.

=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>

C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
converted to corresponding Perl constructs.  Thus, C<"$foo\Qbaz$bar">
is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
The other combinations are replaced with appropriate expansions.

Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
is interpolated in the usual way.  Something like C<"\Q\\E"> has
no C<\E> inside.  instead, it has C<\Q>, C<\\>, and C<E>, so the
result is the same as for C<"\\\\E">.  As a general rule, backslashes
between C<\Q> and C<\E> may lead to counterintuitive results.  So,
C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
as C<"\\\t"> (since TAB is not alphanumeric).  Note also that:

  $str = '\t';
  return "\Q$str";

may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.

Interpolated scalars and arrays are converted internally to the C<join> and
C<.> catenation operations.  Thus, C<"$foo XXX '@arr'"> becomes:

  $foo . " XXX '" . (join $", @arr) . "'";

All operations above are performed simultaneously, left to right.

Because the result of C<"\Q STRING \E"> has all metacharacters
quoted, there is no way to insert a literal C<$> or C<@> inside a
C<\Q\E> pair.  If protected by C<\>, C<$> will be quoted to became
C<"\\\$">; if not, it is interpreted as the start of an interpolated
scalar.

Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends.  For instance, whether 
C<< "a $b -> {c}" >> really means:

  "a " . $b . " -> {c}";

or:

  "a " . $b -> {c};

Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets.  because the outcome may be determined by voting based
on heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.

=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>, 

Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
happens (almost) as with C<qq//> constructs, but the substitution
of C<\> followed by RE-special chars (including C<\>) is not
performed.  Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
a C<#>-comment in a C<//x>-regular expression, no processing is
performed whatsoever.  This is the first step at which the presence
of the C<//x> modifier is relevant.

Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
interpolated, and constructs C<$var[SOMETHING]> are voted (by several
different estimators) to be either an array element or C<$var>
followed by an RE alternative.  This is where the notation
C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
array element C<-9>, not as a regular expression from the variable
C<$arr> followed by a digit, which would be the interpretation of
C</$arr[0-9]/>.  Since voting among different estimators may occur,
the result is not predictable.

It is at this step that C<\1> is begrudgingly converted to C<$1> in
the replacement text of C<s///> to correct the incorrigible
I<sed> hackers who haven't picked up the saner idiom yet.  A warning
is emitted if the C<use warnings> pragma or the B<-w> command-line flag
(that is, the C<$^W> variable) was set.

The lack of processing of C<\\> creates specific restrictions on
the post-processed text.  If the delimiter is C</>, one cannot get
the combination C<\/> into the result of this step.  C</> will
finish the regular expression, C<\/> will be stripped to C</> on
the previous step, and C<\\/> will be left as is.  Because C</> is
equivalent to C<\/> inside a regular expression, this does not
matter unless the delimiter happens to be character special to the
RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
alphanumeric char, as in:

  m m ^ a \s* b mmx;

In the RE above, which is intentionally obfuscated for illustration, the
delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
RE is the same as for C<m/ ^ a s* b /mx>).  There's more than one 
reason you're encouraged to restrict your delimiters to non-alphanumeric,
non-whitespace choices.

=back

This step is the last one for all constructs except regular expressions,
which are processed further.

=item Interpolation of regular expressions

Previous steps were performed during the compilation of Perl code,
but this one happens at run time--although it may be optimized to
be calculated at compile time if appropriate.  After preprocessing
described above, and possibly after evaluation if catenation,
joining, casing translation, or metaquoting are involved, the
resulting I<string> is passed to the RE engine for compilation.

Whatever happens in the RE engine might be better discussed in L<perlre>,
but for the sake of continuity, we shall do so here.

This is another step where the presence of the C<//x> modifier is
relevant.  The RE engine scans the string from left to right and
converts it to a finite automaton.

Backslashed characters are either replaced with corresponding
literal strings (as with C<\{>), or else they generate special nodes
in the finite automaton (as with C<\b>).  Characters special to the
RE engine (such as C<|>) generate corresponding nodes or groups of
nodes.  C<(?#...)> comments are ignored.  All the rest is either
converted to literal strings to match, or else is ignored (as is
whitespace and C<#>-style comments if C<//x> is present).

Parsing of the bracketed character class construct, C<[...]>, is
rather different than the rule used for the rest of the pattern.
The terminator of this construct is found using the same rules as
for finding the terminator of a C<{}>-delimited construct, the only
exception being that C<]> immediately following C<[> is treated as
though preceded by a backslash.  Similarly, the terminator of
C<(?{...})> is found using the same rules as for finding the
terminator of a C<{}>-delimited construct.

It is possible to inspect both the string given to RE engine and the
resulting finite automaton.  See the arguments C<debug>/C<debugcolor>
in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
switch documented in L<perlrun/"Command Switches">.

=item Optimization of regular expressions

This step is listed for completeness only.  Since it does not change
semantics, details of this step are not documented and are subject
to change without notice.  This step is performed over the finite
automaton that was generated during the previous pass.

It is at this stage that C<split()> silently optimizes C</^/> to
mean C</^/m>.

=back

=head2 I/O Operators

There are several I/O operators you should know about.

A string enclosed by backticks (grave accents) first undergoes
double-quote interpolation.  It is then interpreted as an external
command, and the output of that command is the value of the
backtick string, like in a shell.  In scalar context, a single string
consisting of all output is returned.  In list context, a list of
values is returned, one per line of output.  (You can set C<$/> to use
a different line terminator.)  The command is executed each time the
pseudo-literal is evaluated.  The status value of the command is
returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
Unlike in B<csh>, no translation is done on the return data--newlines
remain newlines.  Unlike in any of the shells, single quotes do not
hide variable names in the command from interpretation.  To pass a
literal dollar-sign through to the shell you need to hide it with a
backslash.  The generalized form of backticks is C<qx//>.  (Because
backticks always undergo shell expansion as well, see L<perlsec> for
security concerns.)

In scalar context, evaluating a filehandle in angle brackets yields
the next line from that file (the newline, if any, included), or
C<undef> at end-of-file or on error.  When C<$/> is set to C<undef>
(sometimes known as file-slurp mode) and the file is empty, it
returns C<''> the first time, followed by C<undef> subsequently.

Ordinarily you must assign the returned value to a variable, but
there is one situation where an automatic assignment happens.  If
and only if the input symbol is the only thing inside the conditional
of a C<while> statement (even if disguised as a C<for(;;)> loop),
the value is automatically assigned to the global variable $_,
destroying whatever was there previously.  (This may seem like an
odd thing to you, but you'll use the construct in almost every Perl
script you write.)  The $_ variable is not implicitly localized.
You'll have to put a C<local $_;> before the loop if you want that
to happen.

The following lines are equivalent:

    while (defined($_ = <STDIN>)) { print; }
    while ($_ = <STDIN>) { print; }
    while (<STDIN>) { print; }
    for (;<STDIN>;) { print; }
    print while defined($_ = <STDIN>);
    print while ($_ = <STDIN>);
    print while <STDIN>;

This also behaves similarly, but avoids $_ :

    while (my $line = <STDIN>) { print $line }    

In these loop constructs, the assigned value (whether assignment
is automatic or explicit) is then tested to see whether it is
defined.  The defined test avoids problems where line has a string
value that would be treated as false by Perl, for example a "" or
a "0" with no trailing newline.  If you really mean for such values
to terminate the loop, they should be tested for explicitly:

    while (($_ = <STDIN>) ne '0') { ... }
    while (<STDIN>) { last unless $_; ... }

In other boolean contexts, C<< <I<filehandle>> >> without an
explicit C<defined> test or comparison elicit a warning if the 
C<use warnings> pragma or the B<-w>
command-line switch (the C<$^W> variable) is in effect.

The filehandles STDIN, STDOUT, and STDERR are predefined.  (The
filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
in packages, where they would be interpreted as local identifiers
rather than global.)  Additional filehandles may be created with
the open() function, amongst others.  See L<perlopentut> and
L<perlfunc/open> for details on this.

If a <FILEHANDLE> is used in a context that is looking for
a list, a list comprising all input lines is returned, one line per
list element.  It's easy to grow to a rather large data space this
way, so use with care.

<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
See L<perlfunc/readline>.

The null filehandle <> is special: it can be used to emulate the
behavior of B<sed> and B<awk>.  Input from <> comes either from
standard input, or from each file listed on the command line.  Here's
how it works: the first time <> is evaluated, the @ARGV array is
checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
gives you standard input.  The @ARGV array is then processed as a list
of filenames.  The loop

    while (<>) {
	...			# code for each line
    }

is equivalent to the following Perl-like pseudo code:

    unshift(@ARGV, '-') unless @ARGV;
    while ($ARGV = shift) {
	open(ARGV, $ARGV);
	while (<ARGV>) {
	    ...		# code for each line
	}
    }

except that it isn't so cumbersome to say, and will actually work.
It really does shift the @ARGV array and put the current filename
into the $ARGV variable.  It also uses filehandle I<ARGV>
internally--<> is just a synonym for <ARGV>, which
is magical.  (The pseudo code above doesn't work because it treats
<ARGV> as non-magical.)

You can modify @ARGV before the first <> as long as the array ends up
containing the list of filenames you really want.  Line numbers (C<$.>)
continue as though the input were one big happy file.  See the example
in L<perlfunc/eof> for how to reset line numbers on each file.

If you want to set @ARGV to your own list of files, go right ahead.  
This sets @ARGV to all plain text files if no @ARGV was given:

    @ARGV = grep { -f && -T } glob('*') unless @ARGV;

You can even set them to pipe commands.  For example, this automatically
filters compressed arguments through B<gzip>:

    @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;

If you want to pass switches into your script, you can use one of the
Getopts modules or put a loop on the front like this:

    while ($_ = $ARGV[0], /^-/) {
	shift;
        last if /^--$/;
	if (/^-D(.*)/) { $debug = $1 }
	if (/^-v/)     { $verbose++  }
	# ...		# other switches
    }

    while (<>) {
	# ...		# code for each line
    }

The <> symbol will return C<undef> for end-of-file only once.  
If you call it again after this, it will assume you are processing another 
@ARGV list, and if you haven't set @ARGV, will read input from STDIN.

If angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the
filehandle to input from, or its typeglob, or a reference to the
same.  For example:

    $fh = \*STDIN;
    $line = <$fh>;

If what's within the angle brackets is neither a filehandle nor a simple
scalar variable containing a filehandle name, typeglob, or typeglob
reference, it is interpreted as a filename pattern to be globbed, and
either a list of filenames or the next filename in the list is returned,
depending on context.  This distinction is determined on syntactic
grounds alone.  That means C<< <$x> >> is always a readline() from
an indirect handle, but C<< <$hash{key}> >> is always a glob().
That's because $x is a simple scalar variable, but C<$hash{key}> is
not--it's a hash element.

One level of double-quote interpretation is done first, but you can't
say C<< <$foo> >> because that's an indirect filehandle as explained
in the previous paragraph.  (In older versions of Perl, programmers
would insert curly brackets to force interpretation as a filename glob:
C<< <${foo}> >>.  These days, it's considered cleaner to call the
internal function directly as C<glob($foo)>, which is probably the right
way to have done it in the first place.)  For example:

    while (<*.c>) {
	chmod 0644, $_;
    }

is roughly equivalent to:

    open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
    while (<FOO>) {
	chomp;
	chmod 0644, $_;
    }

except that the globbing is actually done internally using the standard
C<File::Glob> extension.  Of course, the shortest way to do the above is:

    chmod 0644, <*.c>;

A (file)glob evaluates its (embedded) argument only when it is
starting a new list.  All values must be read before it will start
over.  In list context, this isn't important because you automatically
get them all anyway.  However, in scalar context the operator returns
the next value each time it's called, or C<undef> when the list has
run out.  As with filehandle reads, an automatic C<defined> is
generated when the glob occurs in the test part of a C<while>,
because legal glob returns (e.g. a file called F<0>) would otherwise
terminate the loop.  Again, C<undef> is returned only once.  So if
you're expecting a single value from a glob, it is much better to
say

    ($file) = <blurch*>;

than

    $file = <blurch*>;

because the latter will alternate between returning a filename and
returning false.

It you're trying to do variable interpolation, it's definitely better
to use the glob() function, because the older notation can cause people
to become confused with the indirect filehandle notation.

    @files = glob("$dir/*.[ch]");
    @files = glob($files[$i]);

=head2 Constant Folding

Like C, Perl does a certain amount of expression evaluation at
compile time whenever it determines that all arguments to an
operator are static and have no side effects.  In particular, string
concatenation happens at compile time between literals that don't do
variable substitution.  Backslash interpolation also happens at
compile time.  You can say

    'Now is the time for all' . "\n" .
	'good men to come to.'

and this all reduces to one string internally.  Likewise, if
you say

    foreach $file (@filenames) {
	if (-s $file > 5 + 100 * 2**16) {  }
    }

the compiler will precompute the number which that expression
represents so that the interpreter won't have to.

=head2 Bitwise String Operators

Bitstrings of any size may be manipulated by the bitwise operators
(C<~ | & ^>).

If the operands to a binary bitwise op are strings of different
sizes, B<|> and B<^> ops act as though the shorter operand had
additional zero bits on the right, while the B<&> op acts as though
the longer operand were truncated to the length of the shorter.
The granularity for such extension or truncation is one or more
bytes.

    # ASCII-based examples 
    print "j p \n" ^ " a h";        	# prints "JAPH\n"
    print "JA" | "  ph\n";          	# prints "japh\n"
    print "japh\nJunk" & '_____';   	# prints "JAPH\n";
    print 'p N$' ^ " E<H\n";		# prints "Perl\n";

If you are intending to manipulate bitstrings, be certain that
you're supplying bitstrings: If an operand is a number, that will imply
a B<numeric> bitwise operation.  You may explicitly show which type of
operation you intend by using C<""> or C<0+>, as in the examples below.

    $foo =  150  |  105 ;	# yields 255  (0x96 | 0x69 is 0xFF)
    $foo = '150' |  105 ;	# yields 255
    $foo =  150  | '105';	# yields 255
    $foo = '150' | '105';	# yields string '155' (under ASCII)

    $baz = 0+$foo & 0+$bar;	# both ops explicitly numeric
    $biz = "$foo" ^ "$bar";	# both ops explicitly stringy

See L<perlfunc/vec> for information on how to manipulate individual bits
in a bit vector.

=head2 Integer Arithmetic

By default, Perl assumes that it must do most of its arithmetic in
floating point.  But by saying

    use integer;

you may tell the compiler that it's okay to use integer operations
(if it feels like it) from here to the end of the enclosing BLOCK.
An inner BLOCK may countermand this by saying

    no integer;

which lasts until the end of that BLOCK.  Note that this doesn't
mean everything is only an integer, merely that Perl may use integer
operations if it is so inclined.  For example, even under C<use
integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
or so.

Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
and ">>") always produce integral results.  (But see also 
L<Bitwise String Operators>.)  However, C<use integer> still has meaning for
them.  By default, their results are interpreted as unsigned integers, but
if C<use integer> is in effect, their results are interpreted
as signed integers.  For example, C<~0> usually evaluates to a large
integral value.  However, C<use integer; ~0> is C<-1> on twos-complement
machines.

=head2 Floating-point Arithmetic

While C<use integer> provides integer-only arithmetic, there is no
analogous mechanism to provide automatic rounding or truncation to a
certain number of decimal places.  For rounding to a certain number
of digits, sprintf() or printf() is usually the easiest route.
See L<perlfaq4>.

Floating-point numbers are only approximations to what a mathematician
would call real numbers.  There are infinitely more reals than floats,
so some corners must be cut.  For example:

    printf "%.20g\n", 123456789123456789;
    #        produces 123456789123456784

Testing for exact equality of floating-point equality or inequality is
not a good idea.  Here's a (relatively expensive) work-around to compare
whether two floating-point numbers are equal to a particular number of
decimal places.  See Knuth, volume II, for a more robust treatment of
this topic.

    sub fp_equal {
	my ($X, $Y, $POINTS) = @_;
	my ($tX, $tY);
	$tX = sprintf("%.${POINTS}g", $X);
	$tY = sprintf("%.${POINTS}g", $Y);
	return $tX eq $tY;
    }

The POSIX module (part of the standard perl distribution) implements
ceil(), floor(), and other mathematical and trigonometric functions.
The Math::Complex module (part of the standard perl distribution)
defines mathematical functions that work on both the reals and the
imaginary numbers.  Math::Complex not as efficient as POSIX, but
POSIX can't work with complex numbers.

Rounding in financial applications can have serious implications, and
the rounding method used should be specified precisely.  In these
cases, it probably pays not to trust whichever system rounding is
being used by Perl, but to instead implement the rounding function you
need yourself.

=head2 Bigger Numbers

The standard Math::BigInt and Math::BigFloat modules provide
variable-precision arithmetic and overloaded operators, although
they're currently pretty slow. At the cost of some space and
considerable speed, they avoid the normal pitfalls associated with
limited-precision representations.

    use Math::BigInt;
    $x = Math::BigInt->new('123456789123456789');
    print $x * $x;

    # prints +15241578780673678515622620750190521

There are several modules that let you calculate with (bound only by
memory and cpu-time) unlimited or fixed precision. There are also
some non-standard modules that provide faster implementations via
external C libraries.

Here is a short, but incomplete summary:

	Math::Fraction		big, unlimited fractions like 9973 / 12967
	Math::String		treat string sequences like numbers
	Math::FixedPrecision	calculate with a fixed precision
	Math::Currency		for currency calculations
	Bit::Vector		manipulate bit vectors fast (uses C)
	Math::BigIntFast	Bit::Vector wrapper for big numbers
	Math::Pari		provides access to the Pari C library
	Math::BigInteger	uses an external C library
	Math::Cephes		uses external Cephes C library (no big numbers)
	Math::Cephes::Fraction	fractions via the Cephes library
	Math::GMP		another one using an external C library

Choose wisely.

=cut
.02  Roderick Schertler <roderick@argon.org>
#	Check for pod directives following any kind of unempty line, not
#	just lines of whitespace.

@directive = qw(head1 head2 item over back cut pod for begin end);
@directive{@directive} = (1) x @directive;

$exit = $last_unempty = 0;
while (<>) {
    chomp;
    if (/^=(\S+)/ && $directive{$1} && $last_unempty) {
	printf "%s: line %5d, no blank line preceeding directive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlopentut - tutorial on opening things in Perl

=head1 DESCRIPTION

Perl has two simple, built-in ways to open files: the shell way for
convenience, and the C way for precision.  The choice is yours.

=head1 Open E<agrave> la shell

Perl's C<open> function was designed to mimic the way command-line
redirection in the shell works.  Here are some basic examples
from the shell:

    $ myprogram file1 file2 file3
    $ myprogram    <  inputfile
    $ myprogram    >  outputfile
    $ myprogram    >> outputfile
    $ myprogram    |  otherprogram 
    $ otherprogram |  myprogram

And here are some more advanced examples:

    $ otherprogram      | myprogram f1 - f2
    $ otherprogram 2>&1 | myprogram -
    $ myprogram     <&3
    $ myprogram     >&4

Programmers accustomed to constructs like those above can take comfort
in learning that Perl directly supports these familiar constructs using
virtually the same syntax as the shell.

=head2 Simple Opens

The C<open> function takes two arguments: the first is a filehandle,
and the second is a single string comprising both what to open and how
to open it.  C<open> returns true when it works, and when it fails,
returns a false value and sets the special variable $! to reflect
the system error.  If the filehandle was previously opened, it will
be implicitly closed first.

For example:

    open(INFO,      "datafile") || die("can't open datafile: $!");
    open(INFO,   "<  datafile") || die("can't open datafile: $!");
    open(RESULTS,">  runstats") || die("can't open runstats: $!");
    open(LOG,    ">> logfile ") || die("can't open logfile:  $!");

If you prefer the low-punctuation version, you could write that this way:

    open INFO,   "<  datafile"  or die "can't open datafile: $!";
    open RESULTS,">  runstats"  or die "can't open runstats: $!";
    open LOG,    ">> logfile "  or die "can't open logfile:  $!";

A few things to notice.  First, the leading less-than is optional.
If omitted, Perl assumes that you want to open the file for reading.

The other important thing to notice is that, just as in the shell,
any white space before or after the filename is ignored.  This is good,
because you wouldn't want these to do different things:

    open INFO,   "<datafile"   
    open INFO,   "< datafile" 
    open INFO,   "<  datafile"

Ignoring surround whitespace also helps for when you read a filename in
from a different file, and forget to trim it before opening:

    $filename = <INFO>;         # oops, \n still there
    open(EXTRA, "< $filename") || die "can't open $filename: $!";

This is not a bug, but a feature.  Because C<open> mimics the shell in
its style of using redirection arrows to specify how to open the file, it
also does so with respect to extra white space around the filename itself
as well.  For accessing files with naughty names, see 
L<"Dispelling the Dweomer">.

=head2 Pipe Opens

In C, when you want to open a file using the standard I/O library,
you use the C<fopen> function, but when opening a pipe, you use the
C<popen> function.  But in the shell, you just use a different redirection
character.  That's also the case for Perl.  The C<open> call 
remains the same--just its argument differs.  

If the leading character is a pipe symbol, C<open> starts up a new
command and open a write-only filehandle leading into that command.
This lets you write into that handle and have what you write show up on
that command's standard input.  For example:

    open(PRINTER, "| lpr -Plp1")    || die "cannot fork: $!";
    print PRINTER "stuff\n";
    close(PRINTER)                  || die "can't close lpr: $!";

If the trailing character is a pipe, you start up a new command and open a
read-only filehandle leading out of that command.  This lets whatever that
command writes to its standard output show up on your handle for reading.
For example:

    open(NET, "netstat -i -n |")    || die "cannot fork: $!";
    while (<NET>) { }               # do something with input
    close(NET)                      || die "can't close netstat: $!";

What happens if you try to open a pipe to or from a non-existent command?
In most systems, such an C<open> will not return an error. That's
because in the traditional C<fork>/C<exec> model, running the other
program happens only in the forked child process, which means that
the failed C<exec> can't be reflected in the return value of C<open>.
Only a failed C<fork> shows up there.  See 
L<perlfaq8/"Why doesn't open() return an error when a pipe open fails?"> 
to see how to cope with this.  There's also an explanation in L<perlipc>.

If you would like to open a bidirectional pipe, the IPC::Open2
library will handle this for you.  Check out 
L<perlipc/"Bidirectional Communication with Another Process">

=head2 The Minus File

Again following the lead of the standard shell utilities, Perl's
C<open> function treats a file whose name is a single minus, "-", in a
special way.  If you open minus for reading, it really means to access
the standard input.  If you open minus for writing, it really means to
access the standard output.

If minus can be used as the default input or default output, what happens
if you open a pipe into or out of minus?  What's the default command it
would run?  The same script as you're currently running!  This is actually
a stealth C<fork> hidden inside an C<open> call.  See 
L<perlipc/"Safe Pipe Opens"> for details.

=head2 Mixing Reads and Writes

It is possible to specify both read and write access.  All you do is
add a "+" symbol in front of the redirection.  But as in the shell,
using a less-than on a file never creates a new file; it only opens an
existing one.  On the other hand, using a greater-than always clobbers
(truncates to zero length) an existing file, or creates a brand-new one
if there isn't an old one.  Adding a "+" for read-write doesn't affect
whether it only works on existing files or always clobbers existing ones.

    open(WTMP, "+< /usr/adm/wtmp") 
        || die "can't open /usr/adm/wtmp: $!";

    open(SCREEN, "+> /tmp/lkscreen")
        || die "can't open /tmp/lkscreen: $!";

    open(LOGFILE, "+>> /tmp/applog"
        || die "can't open /tmp/applog: $!";

The first one won't create a new file, and the second one will always
clobber an old one.  The third one will create a new file if necessary
and not clobber an old one, and it will allow you to read at any point
in the file, but all writes will always go to the end.  In short,
the first case is substantially more common than the second and third
cases, which are almost always wrong.  (If you know C, the plus in
Perl's C<open> is historically derived from the one in C's fopen(3S),
which it ultimately calls.)

In fact, when it comes to updating a file, unless you're working on
a binary file as in the WTMP case above, you probably don't want to
use this approach for updating.  Instead, Perl's B<-i> flag comes to
the rescue.  The following command takes all the C, C++, or yacc source
or header files and changes all their foo's to bar's, leaving
the old version in the original file name with a ".orig" tacked
on the end:

    $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy]

This is a short cut for some renaming games that are really
the best way to update textfiles.  See the second question in 
L<perlfaq5> for more details.

=head2 Filters 

One of the most common uses for C<open> is one you never
even notice.  When you process the ARGV filehandle using
C<< <ARGV> >>, Perl actually does an implicit open 
on each file in @ARGV.  Thus a program called like this:

    $ myprogram file1 file2 file3

Can have all its files opened and processed one at a time
using a construct no more complex than:

    while (<>) {
        # do something with $_
    } 

If @ARGV is empty when the loop first begins, Perl pretends you've opened
up minus, that is, the standard input.  In fact, $ARGV, the currently
open file during C<< <ARGV> >> processing, is even set to "-"
in these circumstances.

You are welcome to pre-process your @ARGV before starting the loop to
make sure it's to your liking.  One reason to do this might be to remove
command options beginning with a minus.  While you can always roll the
simple ones by hand, the Getopts modules are good for this.

    use Getopt::Std;

    # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o
    getopts("vDo:");            

    # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
    getopts("vDo:", \%args);    

Or the standard Getopt::Long module to permit named arguments:

    use Getopt::Long;
    GetOptions( "verbose"  => \$verbose,        # --verbose
                "Debug"    => \$debug,          # --Debug
                "output=s" => \$output );       
	    # --output=somestring or --output somestring

Another reason for preprocessing arguments is to make an empty
argument list default to all files:

    @ARGV = glob("*") unless @ARGV;

You could even filter out all but plain, text files.  This is a bit
silent, of course, and you might prefer to mention them on the way.

    @ARGV = grep { -f && -T } @ARGV;

If you're using the B<-n> or B<-p> command-line options, you
should put changes to @ARGV in a C<BEGIN{}> block.

Remember that a normal C<open> has special properties, in that it might
call fopen(3S) or it might called popen(3S), depending on what its
argument looks like; that's why it's sometimes called "magic open".
Here's an example:

    $pwdinfo = `domainname` =~ /^(\(none\))?$/
                    ? '< /etc/passwd'
                    : 'ypcat passwd |';

    open(PWD, $pwdinfo)                 
                or die "can't open $pwdinfo: $!";

This sort of thing also comes into play in filter processing.  Because
C<< <ARGV> >> processing employs the normal, shell-style Perl C<open>,
it respects all the special things we've already seen:

    $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile

That program will read from the file F<f1>, the process F<cmd1>, standard
input (F<tmpfile> in this case), the F<f2> file, the F<cmd2> command,
and finally the F<f3> file.

Yes, this also means that if you have a file named "-" (and so on) in
your directory, that they won't be processed as literal files by C<open>.
You'll need to pass them as "./-" much as you would for the I<rm> program.
Or you could use C<sysopen> as described below.

One of the more interesting applications is to change files of a certain
name into pipes.  For example, to autoprocess gzipped or compressed
files by decompressing them with I<gzip>:

    @ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_  } @ARGV;

Or, if you have the I<GET> program installed from LWP,
you can fetch URLs before processing them:

    @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;

It's not for nothing that this is called magic C<< <ARGV> >>.
Pretty nifty, eh?

=head1 Open E<agrave> la C

If you want the convenience of the shell, then Perl's C<open> is
definitely the way to go.  On the other hand, if you want finer precision
than C's simplistic fopen(3S) provides, then you should look to Perl's
C<sysopen>, which is a direct hook into the open(2) system call.
That does mean it's a bit more involved, but that's the price of 
precision.

C<sysopen> takes 3 (or 4) arguments.

    sysopen HANDLE, PATH, FLAGS, [MASK]

The HANDLE argument is a filehandle just as with C<open>.  The PATH is
a literal path, one that doesn't pay attention to any greater-thans or
less-thans or pipes or minuses, nor ignore white space.  If it's there,
it's part of the path.  The FLAGS argument contains one or more values
derived from the Fcntl module that have been or'd together using the
bitwise "|" operator.  The final argument, the MASK, is optional; if
present, it is combined with the user's current umask for the creation
mode of the file.  You should usually omit this.

Although the traditional values of read-only, write-only, and read-write
are 0, 1, and 2 respectively, this is known not to hold true on some
systems.  Instead, it's best to load in the appropriate constants first
from the Fcntl module, which supplies the following standard flags:

    O_RDONLY            Read only
    O_WRONLY            Write only
    O_RDWR              Read and write
    O_CREAT             Create the file if it doesn't exist
    O_EXCL              Fail if the file already exists
    O_APPEND            Append to the file
    O_TRUNC             Truncate the file
    O_NONBLOCK          Non-blocking access

Less common flags that are sometimes available on some operating
systems include C<O_BINARY>, C<O_TEXT>, C<O_SHLOCK>, C<O_EXLOCK>,
C<O_DEFER>, C<O_SYNC>, C<O_ASYNC>, C<O_DSYNC>, C<O_RSYNC>,
C<O_NOCTTY>, C<O_NDELAY> and C<O_LARGEFILE>.  Consult your open(2)
manpage or its local equivalent for details.  (Note: starting from
Perl release 5.6 the O_LARGEFILE flag, if available, is automatically
added to the sysopen() flags because large files are the default.)

Here's how to use C<sysopen> to emulate the simple C<open> calls we had
before.  We'll omit the C<|| die $!> checks for clarity, but make sure
you always check the return values in real code.  These aren't quite
the same, since C<open> will trim leading and trailing white space,
but you'll get the idea:

To open a file for reading:

    open(FH, "< $path");
    sysopen(FH, $path, O_RDONLY);

To open a file for writing, creating a new file if needed or else truncating
an old file:

    open(FH, "> $path");
    sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT);

To open a file for appending, creating one if necessary:

    open(FH, ">> $path");
    sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT);

To open a file for update, where the file must already exist:

    open(FH, "+< $path");
    sysopen(FH, $path, O_RDWR);

And here are things you can do with C<sysopen> that you cannot do with
a regular C<open>.  As you see, it's just a matter of controlling the
flags in the third argument.

To open a file for writing, creating a new file which must not previously
exist:

    sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT);

To open a file for appending, where that file must already exist:

    sysopen(FH, $path, O_WRONLY | O_APPEND);

To open a file for update, creating a new file if necessary:

    sysopen(FH, $path, O_RDWR | O_CREAT);

To open a file for update, where that file must not already exist:

    sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT);

To open a file without blocking, creating one if necessary:

    sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT);

=head2 Permissions E<agrave> la mode

If you omit the MASK argument to C<sysopen>, Perl uses the octal value
0666.  The normal MASK to use for executables and directories should
be 0777, and for anything else, 0666.

Why so permissive?  Well, it isn't really.  The MASK will be modified
by your process's current C<umask>.  A umask is a number representing
I<disabled> permissions bits; that is, bits that will not be turned on
in the created files' permissions field.

For example, if your C<umask> were 027, then the 020 part would
disable the group from writing, and the 007 part would disable others
from reading, writing, or executing.  Under these conditions, passing
C<sysopen> 0666 would create a file with mode 0640, since C<0666 &~ 027>
is 0640.

You should seldom use the MASK argument to C<sysopen()>.  That takes
away the user's freedom to choose what permission new files will have.
Denying choice is almost always a bad thing.  One exception would be for
cases where sensitive or private data is being stored, such as with mail
folders, cookie files, and internal temporary files.

=head1 Obscure Open Tricks

=head2 Re-Opening Files (dups)

Sometimes you already have a filehandle open, and want to make another
handle that's a duplicate of the first one.  In the shell, we place an
ampersand in front of a file descriptor number when doing redirections.
For example, C<< 2>&1 >> makes descriptor 2 (that's STDERR in Perl)
be redirected into descriptor 1 (which is usually Perl's STDOUT).
The same is essentially true in Perl: a filename that begins with an
ampersand is treated instead as a file descriptor if a number, or as a
filehandle if a string.

    open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!";
    open(MHCONTEXT, "<&4")     || die "couldn't dup fd4: $!";

That means that if a function is expecting a filename, but you don't
want to give it a filename because you already have the file open, you
can just pass the filehandle with a leading ampersand.  It's best to
use a fully qualified handle though, just in case the function happens
to be in a different package:

    somefunction("&main::LOGFILE");

This way if somefunction() is planning on opening its argument, it can
just use the already opened handle.  This differs from passing a handle,
because with a handle, you don't open the file.  Here you have something
you can pass to open.

If you have one of those tricky, newfangled I/O objects that the C++
folks are raving about, then this doesn't work because those aren't a
proper filehandle in the native Perl sense.  You'll have to use fileno()
to pull out the proper descriptor number, assuming you can:

    use IO::Socket;
    $handle = IO::Socket::INET->new("www.perl.com:80");
    $fd = $handle->fileno;
    somefunction("&$fd");  # not an indirect function call

It can be easier (and certainly will be faster) just to use real
filehandles though:

    use IO::Socket;
    local *REMOTE = IO::Socket::INET->new("www.perl.com:80");
    die "can't connect" unless defined(fileno(REMOTE));
    somefunction("&main::REMOTE");

If the filehandle or descriptor number is preceded not just with a simple
"&" but rather with a "&=" combination, then Perl will not create a
completely new descriptor opened to the same place using the dup(2)
system call.  Instead, it will just make something of an alias to the
existing one using the fdopen(3S) library call  This is slightly more
parsimonious of systems resources, although this is less a concern
these days.  Here's an example of that:

    $fd = $ENV{"MHCONTEXTFD"};
    open(MHCONTEXT, "<&=$fd")   or die "couldn't fdopen $fd: $!";

If you're using magic C<< <ARGV> >>, you could even pass in as a
command line argument in @ARGV something like C<"<&=$MHCONTEXTFD">,
but we've never seen anyone actually do this.

=head2 Dispelling the Dweomer

Perl is more of a DWIMmer language than something like Java--where DWIM
is an acronym for "do what I mean".  But this principle sometimes leads
to more hidden magic than one knows what to do with.  In this way, Perl
is also filled with I<dweomer>, an obscure word meaning an enchantment.
Sometimes, Perl's DWIMmer is just too much like dweomer for comfort.

If magic C<open> is a bit too magical for you, you don't have to turn
to C<sysopen>.  To open a file with arbitrary weird characters in
it, it's necessary to protect any leading and trailing whitespace.
Leading whitespace is protected by inserting a C<"./"> in front of a
filename that starts with whitespace.  Trailing whitespace is protected
by appending an ASCII NUL byte (C<"\0">) at the end off the string.

    $file =~ s#^(\s)#./$1#;
    open(FH, "< $file\0")   || die "can't open $file: $!";

This assumes, of course, that your system considers dot the current
working directory, slash the directory separator, and disallows ASCII
NULs within a valid filename.  Most systems follow these conventions,
including all POSIX systems as well as proprietary Microsoft systems.
The only vaguely popular system that doesn't work this way is the
proprietary Macintosh system, which uses a colon where the rest of us
use a slash.  Maybe C<sysopen> isn't such a bad idea after all.

If you want to use C<< <ARGV> >> processing in a totally boring
and non-magical way, you could do this first:

    #   "Sam sat on the ground and put his head in his hands.  
    #   'I wish I had never come here, and I don't want to see 
    #   no more magic,' he said, and fell silent."
    for (@ARGV) { 
        s#^([^./])#./$1#;
        $_ .= "\0";
    } 
    while (<>) {  
        # now process $_
    } 

But be warned that users will not appreciate being unable to use "-"
to mean standard input, per the standard convention.

=head2 Paths as Opens

You've probably noticed how Perl's C<warn> and C<die> functions can
produce messages like:

    Some warning at scriptname line 29, <FH> line 7.

That's because you opened a filehandle FH, and had read in seven records
from it.  But what was the name of the file, not the handle?

If you aren't running with C<strict refs>, or if you've turn them off
temporarily, then all you have to do is this:

    open($path, "< $path") || die "can't open $path: $!";
    while (<$path>) {
        # whatever
    } 

Since you're using the pathname of the file as its handle,
you'll get warnings more like

    Some warning at scriptname line 29, </etc/motd> line 7.

=head2 Single Argument Open

Remember how we said that Perl's open took two arguments?  That was a
passive prevarication.  You see, it can also take just one argument.
If and only if the variable is a global variable, not a lexical, you
can pass C<open> just one argument, the filehandle, and it will 
get the path from the global scalar variable of the same name.

    $FILE = "/etc/motd";
    open FILE or die "can't open $FILE: $!";
    while (<FILE>) {
        # whatever
    } 

Why is this here?  Someone has to cater to the hysterical porpoises.
It's something that's been in Perl since the very beginning, if not
before.

=head2 Playing with STDIN and STDOUT

One clever move with STDOUT is to explicitly close it when you're done
with the program.

    END { close(STDOUT) || die "can't close stdout: $!" }

If you don't do this, and your program fills up the disk partition due
to a command line redirection, it won't report the error exit with a
failure status.

You don't have to accept the STDIN and STDOUT you were given.  You are
welcome to reopen them if you'd like.

    open(STDIN, "< datafile")
	|| die "can't open datafile: $!";

    open(STDOUT, "> output")
	|| die "can't open output: $!";

And then these can be read directly or passed on to subprocesses.
This makes it look as though the program were initially invoked
with those redirections from the command line.

It's probably more interesting to connect these to pipes.  For example:

    $pager = $ENV{PAGER} || "(less || more)";
    open(STDOUT, "| $pager")
	|| die "can't fork a pager: $!";

This makes it appear as though your program were called with its stdout
already piped into your pager.  You can also use this kind of thing
in conjunction with an implicit fork to yourself.  You might do this
if you would rather handle the post processing in your own program,
just in a different process:

    head(100);
    while (<>) {
        print;
    } 

    sub head {
        my $lines = shift || 20;
        return unless $pid = open(STDOUT, "|-");
        die "cannot fork: $!" unless defined $pid;
        while (<STDIN>) {
            print;
            last if --$lines < 0;
        } 
        exit;
    } 

This technique can be applied to repeatedly push as many filters on your
output stream as you wish.

=head1 Other I/O Issues

These topics aren't really arguments related to C<open> or C<sysopen>,
but they do affect what you do with your open files.

=head2 Opening Non-File Files

When is a file not a file?  Well, you could say when it exists but
isn't a plain file.   We'll check whether it's a symbolic link first,
just in case.

    if (-l $file || ! -f _) {
        print "$file is not a plain file\n";
    } 

What other kinds of files are there than, well, files?  Directories,
symbolic links, named pipes, Unix-domain sockets, and block and character
devices.  Those are all files, too--just not I<plain> files.  This isn't
the same issue as being a text file. Not all text files are plain files.
Not all plain files are textfiles.  That's why there are separate C<-f>
and C<-T> file tests.

To open a directory, you should use the C<opendir> function, then
process it with C<readdir>, carefully restoring the directory 
name if necessary:

    opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
    while (defined($file = readdir(DIR))) {
        # do something with "$dirname/$file"
    }
    closedir(DIR);

If you want to process directories recursively, it's better to use the
File::Find module.  For example, this prints out all files recursively,
add adds a slash to their names if the file is a directory.

    @ARGV = qw(.) unless @ARGV;
    use File::Find;
    find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;

This finds all bogus symbolic links beneath a particular directory:

    find sub { print "$File::Find::name\n" if -l && !-e }, $dir;

As you see, with symbolic links, you can just pretend that it is
what it points to.  Or, if you want to know I<what> it points to, then
C<readlink> is called for:

    if (-l $file) {
        if (defined($whither = readlink($file))) {
            print "$file points to $whither\n";
        } else {
            print "$file points nowhere: $!\n";
        } 
    } 

Named pipes are a different matter.  You pretend they're regular files,
but their opens will normally block until there is both a reader and
a writer.  You can read more about them in L<perlipc/"Named Pipes">.
Unix-domain sockets are rather different beasts as well; they're
described in L<perlipc/"Unix-Domain TCP Clients and Servers">.

When it comes to opening devices, it can be easy and it can tricky.
We'll assume that if you're opening up a block device, you know what
you're doing.  The character devices are more interesting.  These are
typically used for modems, mice, and some kinds of printers.  This is
described in L<perlfaq8/"How do I read and write the serial port?">
It's often enough to open them carefully:

    sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY)
		# (O_NOCTTY no longer needed on POSIX systems)
        or die "can't open /dev/ttyS1: $!";
    open(TTYOUT, "+>&TTYIN")
        or die "can't dup TTYIN: $!";

    $ofh = select(TTYOUT); $| = 1; select($ofh);

    print TTYOUT "+++at\015";
    $answer = <TTYIN>;

With descriptors that you haven't opened using C<sysopen>, such as a
socket, you can set them to be non-blocking using C<fcntl>:

    use Fcntl;
    fcntl(Connection, F_SETFL, O_NONBLOCK) 
        or die "can't set non blocking: $!";

Rather than losing yourself in a morass of twisting, turning C<ioctl>s,
all dissimilar, if you're going to manipulate ttys, it's best to
make calls out to the stty(1) program if you have it, or else use the
portable POSIX interface.  To figure this all out, you'll need to read the
termios(3) manpage, which describes the POSIX interface to tty devices,
and then L<POSIX>, which describes Perl's interface to POSIX.  There are
also some high-level modules on CPAN that can help you with these games.
Check out Term::ReadKey and Term::ReadLine.

What else can you open?  To open a connection using sockets, you won't use
one of Perl's two open functions.  See 
L<perlipc/"Sockets: Client/Server Communication"> for that.  Here's an 
example.  Once you have it, you can use FH as a bidirectional filehandle.

    use IO::Socket;
    local *FH = IO::Socket::INET->new("www.perl.com:80");

For opening up a URL, the LWP modules from CPAN are just what
the doctor ordered.  There's no filehandle interface, but
it's still easy to get the contents of a document:

    use LWP::Simple;
    $doc = get('http://www.linpro.no/lwp/');

=head2 Binary Files

On certain legacy systems with what could charitably be called terminally
convoluted (some would say broken) I/O models, a file isn't a file--at
least, not with respect to the C standard I/O library.  On these old
systems whose libraries (but not kernels) distinguish between text and
binary streams, to get files to behave properly you'll have to bend over
backwards to avoid nasty problems.  On such infelicitous systems, sockets
and pipes are already opened in binary mode, and there is currently no
way to turn that off.  With files, you have more options.

Another option is to use the C<binmode> function on the appropriate
handles before doing regular I/O on them:

    binmode(STDIN);
    binmode(STDOUT);
    while (<STDIN>) { print } 

Passing C<sysopen> a non-standard flag option will also open the file in
binary mode on those systems that support it.  This is the equivalent of
opening the file normally, then calling C<binmode>ing on the handle.

    sysopen(BINDAT, "records.data", O_RDWR | O_BINARY)
        || die "can't open records.data: $!";

Now you can use C<read> and C<print> on that handle without worrying
about the system non-standard I/O library breaking your data.  It's not
a pretty picture, but then, legacy systems seldom are.  CP/M will be
with us until the end of days, and after.

On systems with exotic I/O systems, it turns out that, astonishingly
enough, even unbuffered I/O using C<sysread> and C<syswrite> might do
sneaky data mutilation behind your back.

    while (sysread(WHENCE, $buf, 1024)) {
        syswrite(WHITHER, $buf, length($buf));
    } 

Depending on the vicissitudes of your runtime system, even these calls
may need C<binmode> or C<O_BINARY> first.  Systems known to be free of
such difficulties include Unix, the Mac OS, Plan9, and Inferno.

=head2 File Locking

In a multitasking environment, you may need to be careful not to collide
with other processes who want to do I/O on the same files as others
are working on.  You'll often need shared or exclusive locks
on files for reading and writing respectively.  You might just
pretend that only exclusive locks exist.

Never use the existence of a file C<-e $file> as a locking indication,
because there is a race condition between the test for the existence of
the file and its creation.  Atomicity is critical.

Perl's most portable locking interface is via the C<flock> function,
whose simplicity is emulated on systems that don't directly support it,
such as SysV or WindowsNT.  The underlying semantics may affect how
it all works, so you should learn how C<flock> is implemented on your
system's port of Perl.

File locking I<does not> lock out another process that would like to
do I/O.  A file lock only locks out others trying to get a lock, not
processes trying to do I/O.  Because locks are advisory, if one process
uses locking and another doesn't, all bets are off.

By default, the C<flock> call will block until a lock is granted.
A request for a shared lock will be granted as soon as there is no
exclusive locker.  A request for a exclusive lock will be granted as
soon as there is no locker of any kind.  Locks are on file descriptors,
not file names.  You can't lock a file until you open it, and you can't
hold on to a lock once the file has been closed.

Here's how to get a blocking shared lock on a file, typically used
for reading:

    use 5.004;
    use Fcntl qw(:DEFAULT :flock);
    open(FH, "< filename")  or die "can't open filename: $!";
    flock(FH, LOCK_SH) 	    or die "can't lock filename: $!";
    # now read from FH

You can get a non-blocking lock by using C<LOCK_NB>.

    flock(FH, LOCK_SH | LOCK_NB)
        or die "can't lock filename: $!";

This can be useful for producing more user-friendly behaviour by warning
if you're going to be blocking:

    use 5.004;
    use Fcntl qw(:DEFAULT :flock);
    open(FH, "< filename")  or die "can't open filename: $!";
    unless (flock(FH, LOCK_SH | LOCK_NB)) {
	$| = 1;
	print "Waiting for lock...";
	flock(FH, LOCK_SH)  or die "can't lock filename: $!";
	print "got it.\n"
    } 
    # now read from FH

To get an exclusive lock, typically used for writing, you have to be
careful.  We C<sysopen> the file so it can be locked before it gets
emptied.  You can get a nonblocking version using C<LOCK_EX | LOCK_NB>.

    use 5.004;
    use Fcntl qw(:DEFAULT :flock);
    sysopen(FH, "filename", O_WRONLY | O_CREAT)
        or die "can't open filename: $!";
    flock(FH, LOCK_EX)
        or die "can't lock filename: $!";
    truncate(FH, 0)
        or die "can't truncate filename: $!";
    # now write to FH

Finally, due to the uncounted millions who cannot be dissuaded from
wasting cycles on useless vanity devices called hit counters, here's
how to increment a number in a file safely:

    use Fcntl qw(:DEFAULT :flock);

    sysopen(FH, "numfile", O_RDWR | O_CREAT)
        or die "can't open numfile: $!";
    # autoflush FH
    $ofh = select(FH); $| = 1; select ($ofh);
    flock(FH, LOCK_EX)
        or die "can't write-lock numfile: $!";

    $num = <FH> || 0;
    seek(FH, 0, 0)
        or die "can't rewind numfile : $!";
    print FH $num+1, "\n"
        or die "can't write numfile: $!";

    truncate(FH, tell(FH))
        or die "can't truncate numfile: $!";
    close(FH)
        or die "can't close numfile: $!";

=head1 SEE ALSO 

The C<open> and C<sysopen> function in perlfunc(1);
the standard open(2), dup(2), fopen(3), and fdopen(3) manpages;
the POSIX documentation.

=head1 AUTHOR and COPYRIGHT

Copyright 1998 Tom Christiansen.  

When included as part of the Standard Version of Perl, or as part of
its complete documentation whether printed or otherwise, this work may
be distributed only under the terms of Perl's Artistic License.  Any
distribution of this file or derivatives thereof outside of that
package require that special arrangements be made with copyright
holder.

Irrespective of its distribution, all code examples in these files are
hereby placed into the public domain.  You are permitted and
encouraged to use this code in your own programs for fun or for profit
as you see fit.  A simple comment in the code giving credit would be
courteous but is not required.

=head1 HISTORY

First release: Sat Jan  9 08:09:11 MST 1999
ent list.  Then we make something to put in
it with C<AECreateDesc()>, a descriptor of type alias (C<typeAlias()>).  

After creating this descriptor, we add it to the list by providing the
C<AEPutDesc()> function with the list, an index number, and the descriptor. 
The other element is added to the list similarly via the C<AEPut()> function,
which creates the descriptor and then adds it to the list.

But this time,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 =head1 NAME

perlpod - plain old documentation

=head1 DESCRIPTION

A pod-to-whatever translator reads a pod file paragraph by paragraph,
and translates it to the appropriate output format.  There are
three kinds of paragraphs:
L<verbatim|/"Verbatim Paragraph">,
L<command|/"Command Paragraph">, and
L<ordinary text|/"Ordinary Block of Text">.

=head2 Verbatim Paragraph

A verbatim paragraph, distinguished by being indented (that is,
it starts with space or tab).  It should be reproduced exactly,
with tabs assumed to be on 8-column boundaries.  There are no
special formatting escapes, so you can't italicize or anything
like that.  A \ means \, and nothing else.

=head2 Command Paragraph

All command paragraphs start with "=", followed by an
identifier, followed by arbitrary text that the command can
use however it pleases.  Currently recognized commands are

    =head1 heading
    =head2 heading
    =item text
    =over N
    =back
    =cut
    =pod
    =for X
    =begin X
    =end X

=over 4

=item =pod

=item =cut

The "=pod" directive does nothing beyond telling the compiler to lay
off parsing code through the next "=cut".  It's useful for adding
another paragraph to the doc if you're mixing up code and pod a lot.

=item =head1

=item =head2

Head1 and head2 produce first and second level headings, with the text in
the same paragraph as the "=headn" directive forming the heading description.

=item =over

=item =back

=item =item

Item, over, and back require a little more explanation: "=over" starts a
section specifically for the generation of a list using "=item" commands. At
the end of your list, use "=back" to end it. You will probably want to give
"4" as the number to "=over", as some formatters will use this for indentation.
The unit of indentation is optional. If the unit is not given the natural
indentation of the formatting system applied will be used. Note also that
there are some basic rules to using =item: don't use them outside of 
an =over/=back block, use at least one inside an =over/=back block, you don't
_have_ to include the =back if the list just runs off the document, and
perhaps most importantly, keep the items consistent: either use "=item *" for
all of them, to produce bullets, or use "=item 1.", "=item 2.", etc., to
produce numbered lists, or use "=item foo", "=item bar", etc., i.e., things
that looks nothing like bullets or numbers. If you start with bullets or
numbers, stick with them, as many formatters use the first "=item" type to
decide how to format the list.

=item =for

=item =begin

=item =end

For, begin, and end let you include sections that are not interpreted
as pod text, but passed directly to particular formatters. A formatter
that can utilize that format will use the section, otherwise it will be
completely ignored.  The directive "=for" specifies that the entire next
paragraph is in the format indicated by the first word after
"=for", like this:

 =for html <br>
  <p> This is a raw HTML paragraph </p>

The paired commands "=begin" and "=end" work very similarly to "=for", but
instead of only accepting a single paragraph, all text from "=begin" to a
paragraph with a matching "=end" are treated as a particular format.

Here are some examples of how to use these:

 =begin html

 <br>Figure 1.<IMG SRC="figure1.png"><br>

 =end html

 =begin text

   ---------------
   |  foo        |
   |        bar  |
   ---------------

 ^^^^ Figure 1. ^^^^

 =end text

Some format names that formatters currently are known to accept include
"roff", "man", "latex", "tex", "text", and "html". (Some formatters will
treat some of these as synonyms.)

And don't forget, when using any command, that the command lasts up until
the end of the B<paragraph>, not the line. Hence in the examples below, you
can see the empty lines after each command to end its paragraph.

Some examples of lists include:

 =over 4

 =item *

 First item

 =item *

 Second item

 =back

 =over 4

 =item Foo()

 Description of Foo function

 =item Bar()

 Description of Bar function

 =back

=back

=head2 Ordinary Block of Text

It will be filled, and maybe even
justified.  Certain interior sequences are recognized both
here and in commands:

    I<text>     Italicize text, used for emphasis or variables
    B<text>     Embolden text, used for switches and programs
    S<text>     Text contains non-breaking spaces
    C<code>     Render code in a typewriter font, or give some other
                indication that this represents program text
    L<name>     A link (cross reference) to name
		    L<name>		manual page
		    L<name/ident>	item in manual page
		    L<name/"sec">	section in other manual page
		    L<"sec">		section in this manual page
					(the quotes are optional)
		    L</"sec">		ditto
		same as above but only 'text' is used for output.
		(Text can not contain the characters '/' and '|', 
		and should contain matched '<' or '>')
		    L<text|name>
		    L<text|name/ident>
		    L<text|name/"sec">
		    L<text|"sec">
		    L<text|/"sec">

    F<file>	Used for filenames
    X<index>	An index entry
    Z<>		A zero-width character
    E<escape>   A named character (very similar to HTML escapes)
		    E<lt>		A literal <
		    E<gt>		A literal >
		    E<sol>		A literal /
		    E<verbar>		A literal |
		    (these are optional except in other interior
		     sequences and when preceded by a capital letter)
		    E<n>		Character number n (probably in ASCII)
    	    	    E<html>		Some non-numeric HTML entity, such
					as E<Agrave>

Most of the time, you will only need a single set of angle brackets to
delimit the beginning and end of interior sequences.  However, sometimes
you will want to put a right angle bracket (or greater-than sign '>')
inside of a sequence.  This is particularly common when using a sequence
to provide a different font-type for a snippet of code.  As with all
things in Perl, there is more than one way to do it.  One way is to
simply escape the closing bracket using an C<E> sequence:

    C<$a E<lt>=E<gt> $b>

This will produce: "C<$a E<lt>=E<gt> $b>"

A more readable, and perhaps more "plain" way is to use an alternate set of
delimiters that doesn't require a ">" to be escaped.  As of perl5.5.660,
doubled angle brackets ("<<" and ">>") may be used I<if and only if there
is whitespace immediately following the opening delimiter and immediately
preceding the closing delimiter!> For example, the following will do the
trick:

    C<< $a <=> $b >>

In fact, you can use as many repeated angle-brackets as you like so
long as you have the same number of them in the opening and closing
delimiters, and make sure that whitespace immediately follows the last
'<' of the opening delimiter, and immediately precedes the first '>' of
the closing delimiter.  So the following will also work:

    C<<< $a <=> $b >>>
    C<<<< $a <=> $b >>>>

This is currently supported by pod2text (Pod::Text), pod2man (Pod::Man),
and any other pod2xxx and Pod::Xxxx translator that uses Pod::Parser
1.093 or later.


=head2 The Intent

That's it.  The intent is simplicity, not power.  I wanted paragraphs
to look like paragraphs (block format), so that they stand out
visually, and so that I could run them through fmt easily to reformat
them (that's F7 in my version of B<vi>).  I wanted the translator (and not
me) to worry about whether " or ' is a left quote or a right quote
within filled text, and I wanted it to leave the quotes alone, dammit, in
verbatim mode, so I could slurp in a working program, shift it over 4
spaces, and have it print out, er, verbatim.  And presumably in a
constant width font.

In particular, you can leave things like this verbatim in your text:

    Perl
    FILEHANDLE
    $variable
    function()
    manpage(3r)

Doubtless a few other commands or sequences will need to be added along
the way, but I've gotten along surprisingly well with just these.

Note that I'm not at all claiming this to be sufficient for producing a
book.  I'm just trying to make an idiot-proof common source for nroff,
TeX, and other markup languages, as used for online documentation.
Translators exist for B<pod2man>  (that's for nroff(1) and troff(1)),
B<pod2text>, B<pod2html>, B<pod2latex>, and B<pod2fm>.

=head2 Embedding Pods in Perl Modules

You can embed pod documentation in your Perl scripts.  Start your
documentation with a "=head1" command at the beginning, and end it
with a "=cut" command.  Perl will ignore the pod text.  See any of the
supplied library modules for examples.  If you're going to put your
pods at the end of the file, and you're using an __END__ or __DATA__
cut mark, make sure to put an empty line there before the first pod
directive.

    __END__

    =head1 NAME

    modern - I am a modern module

If you had not had that empty line there, then the translators wouldn't
have seen it.

=head2 Common Pod Pitfalls

=over 4

=item *

Pod translators usually will require paragraphs to be separated by
completely empty lines.  If you have an apparently empty line with
some spaces on it, this can cause odd formatting.

=item *

Translators will mostly add wording around a LE<lt>E<gt> link, so that
C<LE<lt>foo(1)E<gt>> becomes "the I<foo>(1) manpage", for example (see
B<pod2man> for details).  Thus, you shouldn't write things like C<the
LE<lt>fooE<gt> manpage>, if you want the translated document to read
sensibly.

If you need total control of the text used for a link in the output
use the form LE<lt>show this text|fooE<gt> instead.

=item *

The B<podchecker> command is provided to check pod syntax
for errors and warnings. For example, it checks for completely
blank lines in pod segments and for unknown escape sequences.
It is still advised to pass it through
one or more translators and proofread the result, or print out the
result and proofread that.  Some of the problems found may be bugs in
the translators, which you may or may not wish to work around.

=back

=head1 SEE ALSO

L<pod2man>, L<perlsyn/"PODs: Embedded Documentation">,
L<podchecker>

=head1 AUTHOR

Larry Wall

                                     ? $H d,A   |  A |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  =head1 NAME

perlport - Writing portable Perl

=head1 DESCRIPTION

Perl runs on numerous operating systems.  While most of them share
much in common, they also have their own unique features.

This document is meant to help you to find out what constitutes portable
Perl code.  That way once you make a decision to write portably,
you know where the lines are drawn, and you can stay within them.

There is a tradeoff between taking full advantage of one particular
type of computer and taking advantage of a full range of them.
Naturally, as you broaden your range and become more diverse, the
common factors drop, and you are left with an increasingly smaller
area of common ground in which you can operate to accomplish a
particular task.  Thus, when you begin attacking a problem, it is
important to consider under which part of the tradeoff curve you
want to operate.  Specifically, you must decide whether it is
important that the task that you are coding have the full generality
of being portable, or whether to just get the job done right now.
This is the hardest choice to be made.  The rest is easy, because
Perl provides many choices, whichever way you want to approach your
problem.

Looking at it another way, writing portable code is usually about
willfully limiting your available choices.  Naturally, it takes
discipline and sacrifice to do that.  The product of portability
and convenience may be a constant.  You have been warned.

Be aware of two important points:

=over 4

=item Not all Perl programs have to be portable

There is no reason you should not use Perl as a language to glue Unix
tools together, or to prototype a Macintosh application, or to manage the
Windows registry.  If it makes no sense to aim for portability for one
reason or another in a given program, then don't bother.

=item Nearly all of Perl already I<is> portable

Don't be fooled into thinking that it is hard to create portable Perl
code.  It isn't.  Perl tries its level-best to bridge the gaps between
what's available on different platforms, and all the means available to
use those features.  Thus almost all Perl code runs on any machine
without modification.  But there are some significant issues in
writing portable code, and this document is entirely about those issues.

=back

Here's the general rule: When you approach a task commonly done
using a whole range of platforms, think about writing portable
code.  That way, you don't sacrifice much by way of the implementation
choices you can avail yourself of, and at the same time you can give
your users lots of platform choices.  On the other hand, when you have to
take advantage of some unique feature of a particular platform, as is
often the case with systems programming (whether for Unix, Windows,
S<Mac OS>, VMS, etc.), consider writing platform-specific code.

When the code will run on only two or three operating systems, you
may need to consider only the differences of those particular systems.
The important thing is to decide where the code will run and to be
deliberate in your decision.

The material below is separated into three main sections: main issues of
portability (L<"ISSUES">, platform-specific issues (L<"PLATFORMS">, and
built-in perl functions that behave differently on various ports
(L<"FUNCTION IMPLEMENTATIONS">.

This information should not be considered complete; it includes possibly
transient information about idiosyncrasies of some of the ports, almost
all of which are in a state of constant evolution.  Thus, this material
should be considered a perpetual work in progress
(<IMG SRC="yellow_sign.gif" ALT="Under Construction">).

=head1 ISSUES

=head2 Newlines

In most operating systems, lines in files are terminated by newlines.
Just what is used as a newline may vary from OS to OS.  Unix
traditionally uses C<\012>, one type of DOSish I/O uses C<\015\012>,
and S<Mac OS> uses C<\015>.

Perl uses C<\n> to represent the "logical" newline, where what is
logical may depend on the platform in use.  In MacPerl, C<\n> always
means C<\015>.  In DOSish perls, C<\n> usually means C<\012>, but
when accessing a file in "text" mode, STDIO translates it to (or
from) C<\015\012>, depending on whether you're reading or writing.
Unix does the same thing on ttys in canonical mode.  C<\015\012>
is commonly referred to as CRLF.

A common cause of unportable programs is the misuse of chop() to trim
newlines:

    # XXX UNPORTABLE!
    while(<FILE>) {
        chop;
        @array = split(/:/);
        #...
    }

You can get away with this on Unix and MacOS (they have a single
character end-of-line), but the same program will break under DOSish
perls because you're only chop()ing half the end-of-line.  Instead,
chomp() should be used to trim newlines.  The Dunce::Files module can
help audit your code for misuses of chop().

When dealing with binary files (or text files in binary mode) be sure
to explicitly set $/ to the appropriate value for your file format
before using chomp().

Because of the "text" mode translation, DOSish perls have limitations
in using C<seek> and C<tell> on a file accessed in "text" mode.
Stick to C<seek>-ing to locations you got from C<tell> (and no
others), and you are usually free to use C<seek> and C<tell> even
in "text" mode.  Using C<seek> or C<tell> or other file operations
may be non-portable.  If you use C<binmode> on a file, however, you
can usually C<seek> and C<tell> with arbitrary values in safety.

A common misconception in socket programming is that C<\n> eq C<\012>
everywhere.  When using protocols such as common Internet protocols,
C<\012> and C<\015> are called for specifically, and the values of
the logical C<\n> and C<\r> (carriage return) are not reliable.

    print SOCKET "Hi there, client!\r\n";      # WRONG
    print SOCKET "Hi there, client!\015\012";  # RIGHT

However, using C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious
and unsightly, as well as confusing to those maintaining the code.  As
such, the Socket module supplies the Right Thing for those who want it.

    use Socket qw(:DEFAULT :crlf);
    print SOCKET "Hi there, client!$CRLF"      # RIGHT

When reading from a socket, remember that the default input record
separator C<$/> is C<\n>, but robust socket code will recognize as
either C<\012> or C<\015\012> as end of line:

    while (<SOCKET>) {
        # ...
    }

Because both CRLF and LF end in LF, the input record separator can
be set to LF and any CR stripped later.  Better to write:

    use Socket qw(:DEFAULT :crlf);
    local($/) = LF;      # not needed if $/ is already \012

    while (<SOCKET>) {
        s/$CR?$LF/\n/;   # not sure if socket uses LF or CRLF, OK
    #   s/\015?\012/\n/; # same thing
    }

This example is preferred over the previous one--even for Unix
platforms--because now any C<\015>'s (C<\cM>'s) are stripped out
(and there was much rejoicing).

Similarly, functions that return text data--such as a function that
fetches a web page--should sometimes translate newlines before
returning the data, if they've not yet been translated to the local
newline representation.  A single line of code will often suffice:

    $data =~ s/\015?\012/\n/g;
    return $data;

Some of this may be confusing.  Here's a handy reference to the ASCII CR
and LF characters.  You can print it out and stick it in your wallet.

    LF  ==  \012  ==  \x0A  ==  \cJ  ==  ASCII 10
    CR  ==  \015  ==  \x0D  ==  \cM  ==  ASCII 13

             | Unix | DOS  | Mac  |
        ---------------------------
        \n   |  LF  |  LF  |  CR  |
        \r   |  CR  |  CR  |  LF  |
        \n * |  LF  | CRLF |  CR  |
        \r * |  CR  |  CR  |  LF  |
        ---------------------------
        * text-mode STDIO

The Unix column assumes that you are not accessing a serial line
(like a tty) in canonical mode.  If you are, then CR on input becomes
"\n", and "\n" on output becomes CRLF.

These are just the most common definitions of C<\n> and C<\r> in Perl.
There may well be others.

=head2 Numbers endianness and Width

Different CPUs store integers and floating point numbers in different
orders (called I<endianness>) and widths (32-bit and 64-bit being the
most common today).  This affects your programs when they attempt to transfer
numbers in binary format from one CPU architecture to another,
usually either "live" via network connection, or by storing the
numbers to secondary storage such as a disk file or tape.

Conflicting storage orders make utter mess out of the numbers.  If a
little-endian host (Intel, VAX) stores 0x12345678 (305419896 in
decimal), a big-endian host (Motorola, Sparc, PA) reads it as
0x78563412 (2018915346 in decimal).  Alpha and MIPS can be either:
Digital/Compaq used/uses them in little-endian mode; SGI/Cray uses
them in big-endian mode.  To avoid this problem in network (socket)
connections use the C<pack> and C<unpack> formats C<n> and C<N>, the
"network" orders.  These are guaranteed to be portable.

You can explore the endianness of your platform by unpacking a
data structure packed in native format such as:

    print unpack("h*", pack("s2", 1, 2)), "\n";
    # '10002000' on e.g. Intel x86 or Alpha 21064 in little-endian mode
    # '00100020' on e.g. Motorola 68040

If you need to distinguish between endian architectures you could use
either of the variables set like so:

    $is_big_endian   = unpack("h*", pack("s", 1)) =~ /01/;
    $is_little_endian = unpack("h*", pack("s", 1)) =~ /^1/;

Differing widths can cause truncation even between platforms of equal
endianness.  The platform of shorter width loses the upper parts of the
number.  There is no good solution for this problem except to avoid
transferring or storing raw binary numbers.

One can circumnavigate both these problems in two ways.  Either
transfer and store numbers always in text format, instead of raw
binary, or else consider using modules like Data::Dumper (included in
the standard distribution as of Perl 5.005) and Storable (included as
of perl 5.8).  Keeping all data as text significantly simplifies matters.

The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's
how far EBCDIC, or more precisely UTF-EBCDIC will go.

=head2 Files and Filesystems

Most platforms these days structure files in a hierarchical fashion.
So, it is reasonably safe to assume that all platforms support the
notion of a "path" to uniquely identify a file on the system.  How
that path is really written, though, differs considerably.

Although similar, file path specifications differ between Unix,
Windows, S<Mac OS>, OS/2, VMS, VOS, S<RISC OS>, and probably others.
Unix, for example, is one of the few OSes that has the elegant idea
of a single root directory.

DOS, OS/2, VMS, VOS, and Windows can work similarly to Unix with C</>
as path separator, or in their own idiosyncratic ways (such as having
several root directories and various "unrooted" device files such NIL:
and LPT:).

S<Mac OS> uses C<:> as a path separator instead of C</>.

The filesystem may support neither hard links (C<link>) nor
symbolic links (C<symlink>, C<readlink>, C<lstat>).

The filesystem may support neither access timestamp nor change
timestamp (meaning that about the only portable timestamp is the
modification timestamp), or one second granularity of any timestamps
(e.g. the FAT filesystem limits the time granularity to two seconds).

The "inode change timestamp" (the <-C> filetest) may really be the
"creation timestamp" (which it is not in UNIX).

VOS perl can emulate Unix filenames with C</> as path separator.  The
native pathname characters greater-than, less-than, number-sign, and
percent-sign are always accepted.

S<RISC OS> perl can emulate Unix filenames with C</> as path
separator, or go native and use C<.> for path separator and C<:> to
signal filesystems and disk names.

Don't assume UNIX filesystem access semantics: that read, write,
and execute are all the permissions there are, and even if they exist,
that their semantics (for example what do r, w, and x mean on
a directory) are the UNIX ones.  The various UNIX/POSIX compatibility
layers usually try to make interfaces like chmod() work, but sometimes
there simply is no good mapping.

If all this is intimidating, have no (well, maybe only a little)
fear.  There are modules that can help.  The File::Spec modules
provide methods to do the Right Thing on whatever platform happens
to be running the program.

    use File::Spec::Functions;
    chdir(updir());        # go up one directory
    $file = catfile(curdir(), 'temp', 'file.txt');
    # on Unix and Win32, './temp/file.txt'
    # on Mac OS, ':temp:file.txt'
    # on VMS, '[.temp]file.txt'

File::Spec is available in the standard distribution as of version
5.004_05.  File::Spec::Functions is only in File::Spec 0.7 and later,
and some versions of perl come with version 0.6.  If File::Spec
is not updated to 0.7 or later, you must use the object-oriented
interface from File::Spec (or upgrade File::Spec).

In general, production code should not have file paths hardcoded.
Making them user-supplied or read from a configuration file is
better, keeping in mind that file path syntax varies on different
machines.

This is especially noticeable in scripts like Makefiles and test suites,
which often assume C</> as a path separator for subdirectories.

Also of use is File::Basename from the standard distribution, which
splits a pathname into pieces (base filename, full path to directory,
and file suffix).

Even when on a single platform (if you can call Unix a single platform),
remember not to count on the existence or the contents of particular
system-specific files or directories, like F</etc/passwd>,
F</etc/sendmail.conf>, F</etc/resolv.conf>, or even F</tmp/>.  For
example, F</etc/passwd> may exist but not contain the encrypted
passwords, because the system is using some form of enhanced security.
Or it may not contain all the accounts, because the system is using NIS. 
If code does need to rely on such a file, include a description of the
file and its format in the code's documentation, then make it easy for
the user to override the default location of the file.

Don't assume a text file will end with a newline.  They should,
but people forget.

Do not have two files or directories of the same name with different
case, like F<test.pl> and F<Test.pl>, as many platforms have
case-insensitive (or at least case-forgiving) filenames.  Also, try
not to have non-word characters (except for C<.>) in the names, and
keep them to the 8.3 convention, for maximum portability, onerous a
burden though this may appear.

Likewise, when using the AutoSplit module, try to keep your functions to
8.3 naming and case-insensitive conventions; or, at the least,
make it so the resulting files have a unique (case-insensitively)
first 8 characters.

Whitespace in filenames is tolerated on most systems, but not all,
and even on systems where it might be tolerated, some utilities
might become confused by such whitespace.

Many systems (DOS, VMS) cannot have more than one C<.> in their filenames.

Don't assume C<< > >> won't be the first character of a filename.
Always use C<< < >> explicitly to open a file for reading, or even
better, use the three-arg version of open, unless you want the user to
be able to specify a pipe open.

    open(FILE, '<', $existing_file) or die $!;

If filenames might use strange characters, it is safest to open it
with C<sysopen> instead of C<open>.  C<open> is magic and can
translate characters like C<< > >>, C<< < >>, and C<|>, which may
be the wrong thing to do.  (Sometimes, though, it's the right thing.)
Three-arg open can also help protect against this translation in cases
where it is undesirable.

Don't use C<:> as a part of a filename since many systems use that for
their own semantics (MacOS Classic for separating pathname components,
many networking schemes and utilities for separating the nodename and
the pathname, and so on).  For the same reasons, avoid C<@>, C<;> and
C<|>.

Don't assume that in pathnames you can collapse two leading slashes
C<//> into one: some networking and clustering filesystems have special
semantics for that.  Let the operating system to sort it out.

The I<portable filename characters> as defined by ANSI C are

 a b c d e f g h i j k l m n o p q r t u v w x y z
 A B C D E F G H I J K L M N O P Q R T U V W X Y Z
 0 1 2 3 4 5 6 7 8 9
 . _ -

and the "-" shouldn't be the first character.  If you want to be
hypercorrect, stay case-insensitive and within the 8.3 naming
convention (all the files and directories have to be unique within one
directory if their names are lowercased and truncated to eight
characters before the C<.>, if any, and to three characters after the
C<.>, if any).  (And do not use C<.>s in directory names.)

=head2 System Interaction

Not all platforms provide a command line.  These are usually platforms
that rely primarily on a Graphical User Interface (GUI) for user
interaction.  A program requiring a command line interface might
not work everywhere.  This is probably for the user of the program
to deal with, so don't stay up late worrying about it.

Some platforms can't delete or rename files held open by the system.
Remember to C<close> files when you are done with them.  Don't
C<unlink> or C<rename> an open file.  Don't C<tie> or C<open> a
file already tied or opened; C<untie> or C<close> it first.

Don't open the same file more than once at a time for writing, as some
operating systems put mandatory locks on such files.

Don't assume that write/modify permission on a directory gives the
right to add or delete files/directories in that directory.  That is
filesystem specific: in some filesystems you need write/modify
permission also (or even just) in the file/directory itself.  In some
filesystems (AFS, DFS) the permission to add/delete directory entries
is a completely separate permission.

Don't assume that a single C<unlink> completely gets rid of the file:
some filesystems (most notably the ones in VMS) have versioned
filesystems, and unlink() removes only the most recent one (it doesn't
remove all the versions because by default the native tools on those
platforms remove just the most recent version, too).  The portable
idiom to remove all the versions of a file is

    1 while unlink "file";

This will terminate if the file is undeleteable for some reason
(protected, not there, and so on).

Don't count on a specific environment variable existing in C<%ENV>.
Don't count on C<%ENV> entries being case-sensitive, or even
case-preserving.  Don't try to clear %ENV by saying C<%ENV = ();>, or,
if you really have to, make it conditional on C<$^O ne 'VMS'> since in
VMS the C<%ENV> table is much more than a per-process key-value string
table.

Don't count on signals or C<%SIG> for anything.

Don't count on filename globbing.  Use C<opendir>, C<readdir>, and
C<closedir> instead.

Don't count on per-program environment variables, or per-program current
directories.

Don't count on specific values of C<$!>.

=head2 Interprocess Communication (IPC)

In general, don't directly access the system in code meant to be
portable.  That means, no C<system>, C<exec>, C<fork>, C<pipe>,
C<``>, C<qx//>, C<open> with a C<|>, nor any of the other things
that makes being a perl hacker worth being.

Commands that launch external processes are generally supported on
most platforms (though many of them do not support any type of
forking).  The problem with using them arises from what you invoke
them on.  External tools are often named differently on different
platforms, may not be available in the same location, might accept
different arguments, can behave differently, and often present their
results in a platform-dependent way.  Thus, you should seldom depend
on them to produce consistent results. (Then again, if you're calling 
I<netstat -a>, you probably don't expect it to run on both Unix and CP/M.)

One especially common bit of Perl code is opening a pipe to B<sendmail>:

    open(MAIL, '|/usr/lib/sendmail -t') 
	or die "cannot fork sendmail: $!";

This is fine for systems programming when sendmail is known to be
available.  But it is not fine for many non-Unix systems, and even
some Unix systems that may not have sendmail installed.  If a portable
solution is needed, see the various distributions on CPAN that deal
with it.  Mail::Mailer and Mail::Send in the MailTools distribution are
commonly used, and provide several mailing methods, including mail,
sendmail, and direct SMTP (via Net::SMTP) if a mail transfer agent is
not available.  Mail::Sendmail is a standalone module that provides
simple, platform-independent mailing.

The Unix System V IPC (C<msg*(), sem*(), shm*()>) is not available
even on all Unix platforms.

Do not use either the bare result of C<pack("N", 10, 20, 30, 40)> or
bare v-strings (such as C<v10.20.30.40>) to represent IPv4 addresses:
both forms just pack the four bytes into network order.  That this
would be equal to the C language C<in_addr> struct (which is what the
socket code internally uses) is not guaranteed.  To be portable use
the routines of the Socket extension, such as C<inet_aton()>,
C<inet_ntoa()>, and C<sockaddr_in()>.

The rule of thumb for portable code is: Do it all in portable Perl, or
use a module (that may internally implement it with platform-specific
code, but expose a common interface).

=head2 External Subroutines (XS)

XS code can usually be made to work with any platform, but dependent
libraries, header files, etc., might not be readily available or
portable, or the XS code itself might be platform-specific, just as Perl
code might be.  If the libraries and headers are portable, then it is
normally reasonable to make sure the XS code is portable, too.

A different type of portability issue arises when writing XS code:
availability of a C compiler on the end-user's system.  C brings
with it its own portability issues, and writing XS code will expose
you to some of those.  Writing purely in Perl is an easier way to
achieve portability.

=head2 Standard Modules

In general, the standard modules work across platforms.  Notable
exceptions are the CPAN module (which currently makes connections to external
programs that may not be available), platform-specific modules (like
ExtUtils::MM_VMS), and DBM modules.

There is no one DBM module available on all platforms.
SDBM_File and the others are generally available on all Unix and DOSish
ports, but not in MacPerl, where only NBDM_File and DB_File are
available.

The good news is that at least some DBM module should be available, and
AnyDBM_File will use whichever module it can find.  Of course, then
the code needs to be fairly strict, dropping to the greatest common
factor (e.g., not exceeding 1K for each record), so that it will
work with any DBM module.  See L<AnyDBM_File> for more details.

=head2 Time and Date

The system's notion of time of day and calendar date is controlled in
widely different ways.  Don't assume the timezone is stored in C<$ENV{TZ}>,
and even if it is, don't assume that you can control the timezone through
that variable.

Don't assume that the epoch starts at 00:00:00, January 1, 1970,
because that is OS- and implementation-specific.  It is better to store a date
in an unambiguous representation.  The ISO-8601 standard defines
"YYYY-MM-DD" as the date format.  A text representation (like "1987-12-18")
can be easily converted into an OS-specific value using a module like
Date::Parse.  An array of values, such as those returned by
C<localtime>, can be converted to an OS-specific representation using
Time::Local.

When calculating specific times, such as for tests in time or date modules,
it may be appropriate to calculate an offset for the epoch.

    require Time::Local;
    $offset = Time::Local::timegm(0, 0, 0, 1, 0, 70);

The value for C<$offset> in Unix will be C<0>, but in Mac OS will be
some large number.  C<$offset> can then be added to a Unix time value
to get what should be the proper value on any system.

=head2 Character sets and character encoding

Assume very little about character sets.

Assume nothing about numerical values (C<ord>, C<chr>) of characters.
Do not use explicit code point ranges (like \xHH-\xHH); use for
example symbolic character classes like C<[:print:]>.

Do not assume that the alphabetic characters are encoded contiguously
(in the numeric sense).  There may be gaps.

Do not assume anything about the ordering of the characters.
The lowercase letters may come before or after the uppercase letters;
the lowercase and uppercase may be interlaced so that both `a' and `A'
come before `b'; the accented and other international characters may
be interlaced so that E<auml> comes before `b'.

=head2 Internationalisation

If you may assume POSIX (a rather large assumption), you may read
more about the POSIX locale system from L<perllocale>.  The locale
system at least attempts to make things a little bit more portable,
or at least more convenient and native-friendly for non-English
users.  The system affects character sets and encoding, and date
and time formatting--amongst other things.

=head2 System Resources

If your code is destined for systems with severely constrained (or
missing!) virtual memory systems then you want to be I<especially> mindful
of avoiding wasteful constructs such as:

    # NOTE: this is no longer "bad" in perl5.005
    for (0..10000000) {}                       # bad
    for (my $x = 0; $x <= 10000000; ++$x) {}   # good

    @lines = <VERY_LARGE_FILE>;                # bad

    while (<FILE>) {$file .= $_}               # sometimes bad
    $file = join('', <FILE>);                  # better

The last two constructs may appear unintuitive to most people.  The
first repeatedly grows a string, whereas the second allocates a
large chunk of memory in one go.  On some systems, the second is
more efficient that the first.

=head2 Security

Most multi-user platforms provide basic levels of security, usually
implemented at the filesystem level.  Some, however, do
not-- unfortunately.  Thus the notion of user id, or "home" directory,
or even the state of being logged-in, may be unrecognizable on many
platforms.  If you write programs that are security-conscious, it
is usually best to know what type of system you will be running
under so that you can write code explicitly for that platform (or
class of platforms).

Don't assume the UNIX filesystem access semantics: the operating
system or the filesystem may be using some ACL systems, which are
richer languages than the usual rwx.  Even if the rwx exist,
their semantics might be different.

(From security viewpoint testing for permissions before attempting to
do something is silly anyway: if one tries this, there is potential
for race conditions-- someone or something might change the
permissions between the permissions check and the actual operation.
Just try the operation.)

Don't assume the UNIX user and group semantics: especially, don't
expect the C<< $< >> and C<< $> >> (or the C<$(> and C<$)>) to work
for switching identities (or memberships).

Don't assume set-uid and set-gid semantics. (And even if you do,
think twice: set-uid and set-gid are a known can of security worms.)

=head2 Style

For those times when it is necessary to have platform-specific code,
consider keeping the platform-specific code in one place, making porting
to other platforms easier.  Use the Config module and the special
variable C<$^O> to differentiate platforms, as described in
L<"PLATFORMS">.

Be careful in the tests you supply with your module or programs.
Module code may be fully portable, but its tests might not be.  This
often happens when tests spawn off other processes or call external
programs to aid in the testing, or when (as noted above) the tests
assume certain things about the filesystem and paths.  Be careful
not to depend on a specific output style for errors, such as when
checking C<$!> after a system call.  Some platforms expect a certain
output format, and perl on those platforms may have been adjusted
accordingly.  Most specifically, don't anchor a regex when testing
an error value.

=head1 CPAN Testers

Modules uploaded to CPAN are tested by a variety of volunteers on
different platforms.  These CPAN testers are notified by mail of each
new upload, and reply to the list with PASS, FAIL, NA (not applicable to
this platform), or UNKNOWN (unknown), along with any relevant notations.

The purpose of the testing is twofold: one, to help developers fix any
problems in their code that crop up because of lack of testing on other
platforms; two, to provide users with information about whether
a given module works on a given platform.

=over 4

=item Mailing list: cpan-testers@perl.org

=item Testing results: http://testers.cpan.org/

=back

=head1 PLATFORMS

As of version 5.002, Perl is built with a C<$^O> variable that
indicates the operating system it was built on.  This was implemented
to help speed up code that would otherwise have to C<use Config>
and use the value of C<$Config{osname}>.  Of course, to get more
detailed information about the system, looking into C<%Config> is
certainly recommended.

C<%Config> cannot always be trusted, however, because it was built
at compile time.  If perl was built in one place, then transferred
elsewhere, some values may be wrong.  The values may even have been
edited after the fact.

=head2 Unix

Perl works on a bewildering variety of Unix and Unix-like platforms (see
e.g. most of the files in the F<hints/> directory in the source code kit).
On most of these systems, the value of C<$^O> (hence C<$Config{'osname'}>,
too) is determined either by lowercasing and stripping punctuation from the
first field of the string returned by typing C<uname -a> (or a similar command)
at the shell prompt or by testing the file system for the presence of
uniquely named files such as a kernel or header file.  Here, for example,
are a few of the more popular Unix flavors:

    uname         $^O        $Config{'archname'}
    --------------------------------------------
    AIX           aix        aix
    BSD/OS        bsdos      i386-bsdos
    Darwin        darwin     darwin
    dgux          dgux       AViiON-dgux
    DYNIX/ptx     dynixptx   i386-dynixptx
    FreeBSD       freebsd    freebsd-i386    
    Linux         linux      arm-linux
    Linux         linux      i386-linux
    Linux         linux      i586-linux
    Linux         linux      ppc-linux
    HP-UX         hpux       PA-RISC1.1
    IRIX          irix       irix
    Mac OS X      darwin     darwin
    MachTen PPC   machten    powerpc-machten
    NeXT 3        next       next-fat
    NeXT 4        next       OPENSTEP-Mach
    openbsd       openbsd    i386-openbsd
    OSF1          dec_osf    alpha-dec_osf
    reliantunix-n svr4       RM400-svr4
    SCO_SV        sco_sv     i386-sco_sv
    SINIX-N       svr4       RM400-svr4
    sn4609        unicos     CRAY_C90-unicos
    sn6521        unicosmk   t3e-unicosmk
    sn9617        unicos     CRAY_J90-unicos
    SunOS         solaris    sun4-solaris
    SunOS         solaris    i86pc-solaris
    SunOS4        sunos      sun4-sunos

Because the value of C<$Config{archname}> may depend on the
hardware architecture, it can vary more than the value of C<$^O>.

=head2 DOS and Derivatives

Perl has long been ported to Intel-style microcomputers running under
systems like PC-DOS, MS-DOS, OS/2, and most Windows platforms you can
bring yourself to mention (except for Windows CE, if you count that).
Users familiar with I<COMMAND.COM> or I<CMD.EXE> style shells should
be aware that each of these file specifications may have subtle
differences:

    $filespec0 = "c:/foo/bar/file.txt";
    $filespec1 = "c:\\foo\\bar\\file.txt";
    $filespec2 = 'c:\foo\bar\file.txt';
    $filespec3 = 'c:\\foo\\bar\\file.txt';

System calls accept either C</> or C<\> as the path separator.
However, many command-line utilities of DOS vintage treat C</> as
the option prefix, so may get confused by filenames containing C</>.
Aside from calling any external programs, C</> will work just fine,
and probably better, as it is more consistent with popular usage,
and avoids the problem of remembering what to backwhack and what
not to.

The DOS FAT filesystem can accommodate only "8.3" style filenames.  Under
the "case-insensitive, but case-preserving" HPFS (OS/2) and NTFS (NT)
filesystems you may have to be careful about case returned with functions
like C<readdir> or used with functions like C<open> or C<opendir>.

DOS also treats several filenames as special, such as AUX, PRN,
NUL, CON, COM1, LPT1, LPT2, etc.  Unfortunately, sometimes these
filenames won't even work if you include an explicit directory
prefix.  It is best to avoid such filenames, if you want your code
to be portable to DOS and its derivatives.  It's hard to know what
these all are, unfortunately.

Users of these operating systems may also wish to make use of
scripts such as I<pl2bat.bat> or I<pl2cmd> to
put wrappers around your scripts.

Newline (C<\n>) is translated as C<\015\012> by STDIO when reading from
and writing to files (see L<"Newlines">).  C<binmode(FILEHANDLE)>
will keep C<\n> translated as C<\012> for that filehandle.  Since it is a
no-op on other systems, C<binmode> should be used for cross-platform code
that deals with binary data.  That's assuming you realize in advance
that your data is in binary.  General-purpose programs should
often assume nothing about their data.

The C<$^O> variable and the C<$Config{archname}> values for various
DOSish perls are as follows:

     OS            $^O      $Config{archname}   ID    Version
     --------------------------------------------------------
     MS-DOS        dos        ?                 
     PC-DOS        dos        ?                 
     OS/2          os2        ?
     Windows 3.1   ?          ?                 0      3 01
     Windows 95    MSWin32    MSWin32-x86       1      4 00
     Windows 98    MSWin32    MSWin32-x86       1      4 10
     Windows ME    MSWin32    MSWin32-x86       1      ?
     Windows NT    MSWin32    MSWin32-x86       2      4 xx
     Windows NT    MSWin32    MSWin32-ALPHA     2      4 xx
     Windows NT    MSWin32    MSWin32-ppc       2      4 xx
     Windows 2000  MSWin32    MSWin32-x86       2      5 xx
     Windows XP    MSWin32    MSWin32-x86       2      ?
     Windows CE    MSWin32    ?                 3           
     Cygwin        cygwin     ?                 

The various MSWin32 Perl's can distinguish the OS they are running on
via the value of the fifth element of the list returned from 
Win32::GetOSVersion().  For example:

    if ($^O eq 'MSWin32') {
        my @os_version_info = Win32::GetOSVersion();
        print +('3.1','95','NT')[$os_version_info[4]],"\n";
    }

Also see:

=over 4

=item *

The djgpp environment for DOS, http://www.delorie.com/djgpp/
and L<perldos>.

=item *

The EMX environment for DOS, OS/2, etc. emx@iaehv.nl,
http://www.leo.org/pub/comp/os/os2/leo/gnu/emx+gcc/index.html or
ftp://hobbes.nmsu.edu/pub/os2/dev/emx.  Also L<perlos2>.

=item *

Build instructions for Win32 in L<perlwin32>, or under the Cygnus environment
in L<perlcygwin>.  

=item *

The C<Win32::*> modules in L<Win32>.

=item *

The ActiveState Pages, http://www.activestate.com/

=item *

The Cygwin environment for Win32; F<README.cygwin> (installed 
as L<perlcygwin>), http://www.cygwin.com/

=item *

The U/WIN environment for Win32,
http://www.research.att.com/sw/tools/uwin/

=item *

Build instructions for OS/2, L<perlos2>

=back

=head2 S<Mac OS>

Any module requiring XS compilation is right out for most people, because
MacPerl is built using non-free (and non-cheap!) compilers.  Some XS
modules that can work with MacPerl are built and distributed in binary
form on CPAN.

Directories are specified as:

    volume:folder:file              for absolute pathnames
    volume:folder:                  for absolute pathnames
    :folder:file                    for relative pathnames
    :folder:                        for relative pathnames
    :file                           for relative pathnames
    file                            for relative pathnames

Files are stored in the directory in alphabetical order.  Filenames are
limited to 31 characters, and may include any character except for
null and C<:>, which is reserved as the path separator.

Instead of C<flock>, see C<FSpSetFLock> and C<FSpRstFLock> in the
Mac::Files module, or C<chmod(0444, ...)> and C<chmod(0666, ...)>.

In the MacPerl application, you can't run a program from the command line;
programs that expect C<@ARGV> to be populated can be edited with something
like the following, which brings up a dialog box asking for the command
line arguments.

    if (!@ARGV) {
        @ARGV = split /\s+/, MacPerl::Ask('Arguments?');
    }

A MacPerl script saved as a "droplet" will populate C<@ARGV> with the full
pathnames of the files dropped onto the script.

Mac users can run programs under a type of command line interface
under MPW (Macintosh Programmer's Workshop, a free development
environment from Apple).  MacPerl was first introduced as an MPW
tool, and MPW can be used like a shell:

    perl myscript.plx some arguments

ToolServer is another app from Apple that provides access to MPW tools
from MPW and the MacPerl app, which allows MacPerl programs to use
C<system>, backticks, and piped C<open>.

"S<Mac OS>" is the proper name for the operating system, but the value
in C<$^O> is "MacOS".  To determine architecture, version, or whether
the application or MPW tool version is running, check:

    $is_app    = $MacPerl::Version =~ /App/;
    $is_tool   = $MacPerl::Version =~ /MPW/;
    ($version) = $MacPerl::Version =~ /^(\S+)/;
    $is_ppc    = $MacPerl::Architecture eq 'MacPPC';
    $is_68k    = $MacPerl::Architecture eq 'Mac68K';

S<Mac OS X>, based on NeXT's OpenStep OS, runs MacPerl natively, under the
"Classic" environment.  There is no "Carbon" version of MacPerl to run
under the primary Mac OS X environment.  S<Mac OS X> and its Open Source
version, Darwin, both run Unix perl natively.

Also see:

=over 4

=item *

MacPerl Development, http://dev.macperl.org/ .

=item *

The MacPerl Pages, http://www.macperl.com/ .

=item *

The MacPerl mailing lists, http://lists.perl.org/ .

=back

=head2 VMS

Perl on VMS is discussed in L<perlvms> in the perl distribution.
Perl on VMS can accept either VMS- or Unix-style file
specifications as in either of the following:

    $ perl -ne "print if /perl_setup/i" SYS$LOGIN:LOGIN.COM
    $ perl -ne "print if /perl_setup/i" /sys$login/login.com

but not a mixture of both as in:

    $ perl -ne "print if /perl_setup/i" sys$login:/login.com
    Can't open sys$login:/login.com: file specification syntax error

Interacting with Perl from the Digital Command Language (DCL) shell
often requires a different set of quotation marks than Unix shells do.
For example:

    $ perl -e "print ""Hello, world.\n"""
    Hello, world.

There are several ways to wrap your perl scripts in DCL F<.COM> files, if
you are so inclined.  For example:

    $ write sys$output "Hello from DCL!"
    $ if p1 .eqs. ""
    $ then perl -x 'f$environment("PROCEDURE")
    $ else perl -x - 'p1 'p2 'p3 'p4 'p5 'p6 'p7 'p8
    $ deck/dollars="__END__"
    #!/usr/bin/perl

    print "Hello from Perl!\n";

    __END__
    $ endif

Do take care with C<$ ASSIGN/nolog/user SYS$COMMAND: SYS$INPUT> if your
perl-in-DCL script expects to do things like C<< $read = <STDIN>; >>.

Filenames are in the format "name.extension;version".  The maximum
length for filenames is 39 characters, and the maximum length for
extensions is also 39 characters.  Version is a number from 1 to
32767.  Valid characters are C</[A-Z0-9$_-]/>.

VMS's RMS filesystem is case-insensitive and does not preserve case.
C<readdir> returns lowercased filenames, but specifying a file for
opening remains case-insensitive.  Files without extensions have a
trailing period on them, so doing a C<readdir> with a file named F<A.;5>
will return F<a.> (though that file could be opened with
C<open(FH, 'A')>).

RMS had an eight level limit on directory depths from any rooted logical
(allowing 16 levels overall) prior to VMS 7.2.  Hence
C<PERL_ROOT:[LIB.2.3.4.5.6.7.8]> is a valid directory specification but
C<PERL_ROOT:[LIB.2.3.4.5.6.7.8.9]> is not.  F<Makefile.PL> authors might
have to take this into account, but at least they can refer to the former
as C</PERL_ROOT/lib/2/3/4/5/6/7/8/>.

The VMS::Filespec module, which gets installed as part of the build
process on VMS, is a pure Perl module that can easily be installed on
non-VMS platforms and can be helpful for conversions to and from RMS
native formats.

What C<\n> represents depends on the type of file opened.  It usually
represents C<\012> but it could also be C<\015>, C<\012>, C<\015\012>, 
C<\000>, C<\040>, or nothing depending on the file organiztion and 
record format.  The VMS::Stdio module provides access to the 
special fopen() requirements of files with unusual attributes on VMS.

TCP/IP stacks are optional on VMS, so socket routines might not be
implemented.  UDP sockets may not be supported.

The value of C<$^O> on OpenVMS is "VMS".  To determine the architecture
that you are running on without resorting to loading all of C<%Config>
you can examine the content of the C<@INC> array like so:

    if (grep(/VMS_AXP/, @INC)) {
        print "I'm on Alpha!\n";

    } elsif (grep(/VMS_VAX/, @INC)) {
        print "I'm on VAX!\n";

    } else {
        print "I'm not so sure about where $^O is...\n";
    }

On VMS, perl determines the UTC offset from the C<SYS$TIMEZONE_DIFFERENTIAL>
logical name.  Although the VMS epoch began at 17-NOV-1858 00:00:00.00,
calls to C<localtime> are adjusted to count offsets from
01-JAN-1970 00:00:00.00, just like Unix.

Also see:

=over 4

=item *

F<README.vms> (installed as L<README_vms>), L<perlvms>

=item *

vmsperl list, majordomo@perl.org

(Put the words C<subscribe vmsperl> in message body.)

=item *

vmsperl on the web, http://www.sidhe.org/vmsperl/index.html

=back

=head2 VOS

Perl on VOS is discussed in F<README.vos> in the perl distribution
(installed as L<perlvos>).  Perl on VOS can accept either VOS- or
Unix-style file specifications as in either of the following:

    $ perl -ne "print if /perl_setup/i" >system>notices
    $ perl -ne "print if /perl_setup/i" /system/notices

or even a mixture of both as in:

    $ perl -ne "print if /perl_setup/i" >system/notices

Even though VOS allows the slash character to appear in object
names, because the VOS port of Perl interprets it as a pathname
delimiting character, VOS files, directories, or links whose names
contain a slash character cannot be processed.  Such files must be
renamed before they can be processed by Perl.  Note that VOS limits
file names to 32 or fewer characters.

See F<README.vos> for restrictions that apply when Perl is built
with the alpha version of VOS POSIX.1 support.

Perl on VOS is built without any extensions and does not support
dynamic loading.

The value of C<$^O> on VOS is "VOS".  To determine the architecture that
you are running on without resorting to loading all of C<%Config> you
can examine the content of the @INC array like so:

    if ($^O =~ /VOS/) {
        print "I'm on a Stratus box!\n";
    } else {
        print "I'm not on a Stratus box!\n";
        die;
    }

    if (grep(/860/, @INC)) {
        print "This box is a Stratus XA/R!\n";

    } elsif (grep(/7100/, @INC)) {
        print "This box is a Stratus HP 7100 or 8xxx!\n";

    } elsif (grep(/8000/, @INC)) {
        print "This box is a Stratus HP 8xxx!\n";

    } else {
        print "This box is a Stratus 68K!\n";
    }

Also see:

=over 4

=item *

F<README.vos>

=item *

The VOS mailing list.

There is no specific mailing list for Perl on VOS.  You can post
comments to the comp.sys.stratus newsgroup, or subscribe to the general
Stratus mailing list.  Send a letter with "Subscribe Info-Stratus" in
the message body to majordomo@list.stratagy.com.

=item *

VOS Perl on the web at http://ftp.stratus.com/pub/vos/vos.html

=back

=head2 EBCDIC Platforms

Recent versions of Perl have been ported to platforms such as OS/400 on
AS/400 minicomputers as well as OS/390, VM/ESA, and BS2000 for S/390
Mainframes.  Such computers use EBCDIC character sets internally (usually
Character Code Set ID 0037 for OS/400 and either 1047 or POSIX-BC for S/390
systems).  On the mainframe perl currently works under the "Unix system
services for OS/390" (formerly known as OpenEdition), VM/ESA OpenEdition, or
the BS200 POSIX-BC system (BS2000 is supported in perl 5.6 and greater).
See L<perlos390> for details.  

As of R2.5 of USS for OS/390 and Version 2.3 of VM/ESA these Unix
sub-systems do not support the C<#!> shebang trick for script invocation.
Hence, on OS/390 and VM/ESA perl scripts can be executed with a header
similar to the following simple script:

    : # use perl
        eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}'
            if 0;
    #!/usr/local/bin/perl     # just a comment really

    print "Hello from perl!\n";

OS/390 will support the C<#!> shebang trick in release 2.8 and beyond.
Calls to C<system> and backticks can use POSIX shell syntax on all
S/390 systems.

On the AS/400, if PERL5 is in your library list, you may need
to wrap your perl scripts in a CL procedure to invoke them like so:

    BEGIN
      CALL PGM(PERL5/PERL) PARM('/QOpenSys/hello.pl')
    ENDPGM

This will invoke the perl script F<hello.pl> in the root of the
QOpenSys file system.  On the AS/400 calls to C<system> or backticks
must use CL syntax.

On these platforms, bear in mind that the EBCDIC character set may have
an effect on what happens with some perl functions (such as C<chr>,
C<pack>, C<print>, C<printf>, C<ord>, C<sort>, C<sprintf>, C<unpack>), as
well as bit-fiddling with ASCII constants using operators like C<^>, C<&>
and C<|>, not to mention dealing with socket interfaces to ASCII computers
(see L<"Newlines">).

Fortunately, most web servers for the mainframe will correctly
translate the C<\n> in the following statement to its ASCII equivalent
(C<\r> is the same under both Unix and OS/390 & VM/ESA):

    print "Content-type: text/html\r\n\r\n";

The values of C<$^O> on some of these platforms includes:

    uname         $^O        $Config{'archname'}
    --------------------------------------------
    OS/390        os390      os390
    OS400         os400      os400
    POSIX-BC      posix-bc   BS2000-posix-bc
    VM/ESA        vmesa      vmesa

Some simple tricks for determining if you are running on an EBCDIC
platform could include any of the following (perhaps all):

    if ("\t" eq "\05")   { print "EBCDIC may be spoken here!\n"; }

    if (ord('A') == 193) { print "EBCDIC may be spoken here!\n"; }

    if (chr(169) eq 'z') { print "EBCDIC may be spoken here!\n"; }

One thing you may not want to rely on is the EBCDIC encoding
of punctuation characters since these may differ from code page to code
page (and once your module or script is rumoured to work with EBCDIC,
folks will want it to work with all EBCDIC character sets).

Also see:

=over 4

=item *

*

L<perlos390>, F<README.os390>, F<perlbs2000>, F<README.vmesa>,
L<perlebcdic>.

=item *

The perl-mvs@perl.org list is for discussion of porting issues as well as
general usage issues for all EBCDIC Perls.  Send a message body of
"subscribe perl-mvs" to majordomo@perl.org.

=item  *

AS/400 Perl information at
http://as400.rochester.ibm.com/
as well as on CPAN in the F<ports/> directory.

=back

=head2 Acorn RISC OS

Because Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like
Unix, and because Unix filename emulation is turned on by default, 
most simple scripts will probably work "out of the box".  The native
filesystem is modular, and individual filesystems are free to be
case-sensitive or insensitive, and are usually case-preserving.  Some
native filesystems have name length limits, which file and directory
names are silently truncated to fit.  Scripts should be aware that the
standard filesystem currently has a name length limit of B<10>
characters, with up to 77 items in a directory, but other filesystems
may not impose such limitations.

Native filenames are of the form

    Filesystem#Special_Field::DiskName.$.Directory.Directory.File

where

    Special_Field is not usually present, but may contain . and $ .
    Filesystem =~ m|[A-Za-z0-9_]|
    DsicName   =~ m|[A-Za-z0-9_/]|
    $ represents the root directory
    . is the path separator
    @ is the current directory (per filesystem but machine global)
    ^ is the parent directory
    Directory and File =~ m|[^\0- "\.\$\%\&:\@\\^\|\177]+|

The default filename translation is roughly C<tr|/.|./|;>

Note that C<"ADFS::HardDisk.$.File" ne 'ADFS::HardDisk.$.File'> and that
the second stage of C<$> interpolation in regular expressions will fall
foul of the C<$.> if scripts are not careful.

Logical paths specified by system variables containing comma-separated
search lists are also allowed; hence C<System:Modules> is a valid
filename, and the filesystem will prefix C<Modules> with each section of
C<System$Path> until a name is made that points to an object on disk.
Writing to a new file C<System:Modules> would be allowed only if
C<System$Path> contains a single item list.  The filesystem will also
expand system variables in filenames if enclosed in angle brackets, so
C<< <System$Dir>.Modules >> would look for the file
S<C<$ENV{'System$Dir'} . 'Modules'>>.  The obvious implication of this is
that B<fully qualified filenames can start with C<< <> >>> and should
be protected when C<open> is used for input.

Because C<.> was in use as a directory separator and filenames could not
be assumed to be unique after 10 characters, Acorn implemented the C
compiler to strip the trailing C<.c> C<.h> C<.s> and C<.o> suffix from
filenames specified in source code and store the respective files in
subdirectories named after the suffix.  Hence files are translated:

    foo.h           h.foo
    C:foo.h         C:h.foo        (logical path variable)
    sys/os.h        sys.h.os       (C compiler groks Unix-speak)
    10charname.c    c.10charname
    10charname.o    o.10charname
    11charname_.c   c.11charname   (assuming filesystem truncates at 10)

The Unix emulation library's translation of filenames to native assumes
that this sort of translation is required, and it allows a user-defined list
of known suffixes that it will transpose in this fashion.  This may
seem transparent, but consider that with these rules C<foo/bar/baz.h>
and C<foo/bar/h/baz> both map to C<foo.bar.h.baz>, and that C<readdir> and
C<glob> cannot and do not attempt to emulate the reverse mapping.  Other
C<.>'s in filenames are translated to C</>.

As implied above, the environment accessed through C<%ENV> is global, and
the convention is that program specific environment variables are of the
form C<Program$Name>.  Each filesystem maintains a current directory,
and the current filesystem's current directory is the B<global> current
directory.  Consequently, sociable programs don't change the current
directory but rely on full pathnames, and programs (and Makefiles) cannot
assume that they can spawn a child process which can change the current
directory without affecting its parent (and everyone else for that
matter).

Because native operating system filehandles are global and are currently 
allocated down from 255, with 0 being a reserved value, the Unix emulation
library emulates Unix filehandles.  Consequently, you can't rely on
passing C<STDIN>, C<STDOUT>, or C<STDERR> to your children.

The desire of users to express filenames of the form
C<< <Foo$Dir>.Bar >> on the command line unquoted causes problems,
too: C<``> command output capture has to perform a guessing game.  It
assumes that a string C<< <[^<>]+\$[^<>]> >> is a
reference to an environment variable, whereas anything else involving
C<< < >> or C<< > >> is redirection, and generally manages to be 99%
right.  Of course, the problem remains that scripts cannot rely on any
Unix tools being available, or that any tools found have Unix-like command
line arguments.

Extensions and XS are, in theory, buildable by anyone using free
tools.  In practice, many don't, as users of the Acorn platform are
used to binary distributions.  MakeMaker does run, but no available
make currently copes with MakeMaker's makefiles; even if and when
this should be fixed, the lack of a Unix-like shell will cause
problems with makefile rules, especially lines of the form C<cd
sdbm && make all>, and anything using quoting.

"S<RISC OS>" is the proper name for the operating system, but the value
in C<$^O> is "riscos" (because we don't like shouting).

=head2 Other perls

Perl has been ported to many platforms that do not fit into any of
the categories listed above.  Some, such as AmigaOS, Atari MiNT,
BeOS, HP MPE/iX, QNX, Plan 9, and VOS, have been well-integrated
into the standard Perl source code kit.  You may need to see the
F<ports/> directory on CPAN for information, and possibly binaries,
for the likes of: aos, Atari ST, lynxos, riscos, Novell Netware,
Tandem Guardian, I<etc.>  (Yes, we know that some of these OSes may
fall under the Unix category, but we are not a standards body.)

Some approximate operating system names and their C<$^O> values
in the "OTHER" category include:

    OS            $^O        $Config{'archname'}
    ------------------------------------------
    Amiga DOS     amigaos    m68k-amigos
    MPE/iX        mpeix      PA-RISC1.1

See also:

=over 4

=item *

Amiga, F<README.amiga> (installed as L<perlamiga>).

=item *

Atari, F<README.mint> and Guido Flohr's web page
http://stud.uni-sb.de/~gufl0000/

=item *

Be OS, F<README.beos>

=item *

HP 300 MPE/iX, F<README.mpeix> and Mark Bixby's web page
http://www.bixby.org/mark/perlix.html

=item *

A free perl5-based PERL.NLM for Novell Netware is available in
precompiled binary and source code form from http://www.novell.com/
as well as from CPAN.

=item  *

Plan 9, F<README.plan9>

=back

=head1 FUNCTION IMPLEMENTATIONS

Listed below are functions that are either completely unimplemented
or else have been implemented differently on various platforms.
Following each description will be, in parentheses, a list of
platforms that the description applies to.

The list may well be incomplete, or even wrong in some places.  When
in doubt, consult the platform-specific README files in the Perl
source distribution, and any other documentation resources accompanying
a given port.

Be aware, moreover, that even among Unix-ish systems there are variations.

For many functions, you can also query C<%Config>, exported by
default from the Config module.  For example, to check whether the
platform has the C<lstat> call, check C<$Config{d_lstat}>.  See
L<Config> for a full description of available variables.

=head2 Alphabetical Listing of Perl Functions

=over 8

=item -X FILEHANDLE

=item -X EXPR

=item -X

C<-r>, C<-w>, and C<-x> have a limited meaning only; directories
and applications are executable, and there are no uid/gid
considerations.  C<-o> is not supported.  (S<Mac OS>)

C<-r>, C<-w>, C<-x>, and C<-o> tell whether the file is accessible,
which may not reflect UIC-based file protections.  (VMS)

C<-s> returns the size of the data fork, not the total size of data fork
plus resource fork.  (S<Mac OS>).

C<-s> by name on an open file will return the space reserved on disk,
rather than the current extent.  C<-s> on an open filehandle returns the
current size.  (S<RISC OS>)

C<-R>, C<-W>, C<-X>, C<-O> are indistinguishable from C<-r>, C<-w>,
C<-x>, C<-o>. (S<Mac OS>, Win32, VMS, S<RISC OS>)

C<-b>, C<-c>, C<-k>, C<-g>, C<-p>, C<-u>, C<-A> are not implemented.
(S<Mac OS>)

C<-g>, C<-k>, C<-l>, C<-p>, C<-u>, C<-A> are not particularly meaningful.
(Win32, VMS, S<RISC OS>)

C<-d> is true if passed a device spec without an explicit directory.
(VMS)

C<-T> and C<-B> are implemented, but might misclassify Mac text files
with foreign characters; this is the case will all platforms, but may
affect S<Mac OS> often.  (S<Mac OS>)

C<-x> (or C<-X>) determine if a file ends in one of the executable
suffixes.  C<-S> is meaningless.  (Win32)

C<-x> (or C<-X>) determine if a file has an executable file type.
(S<RISC OS>)

=item alarm SECONDS

=item alarm

Not implemented. (Win32)

=item binmode FILEHANDLE

Meaningless.  (S<Mac OS>, S<RISC OS>)

Reopens file and restores pointer; if function fails, underlying
filehandle may be closed, or pointer may be in a different position.
(VMS)

The value returned by C<tell> may be affected after the call, and
the filehandle may be flushed. (Win32)

=item chmod LIST

Only limited meaning.  Disabling/enabling write permission is mapped to
locking/unlocking the file. (S<Mac OS>)

Only good for changing "owner" read-write access, "group", and "other"
bits are meaningless. (Win32)

Only good for changing "owner" and "other" read-write access. (S<RISC OS>)

Access permissions are mapped onto VOS access-control list changes. (VOS)

=item chown LIST

Not implemented. (S<Mac OS>, Win32, Plan9, S<RISC OS>, VOS)

Does nothing, but won't fail. (Win32)

=item chroot FILENAME

=item chroot

Not implemented. (S<Mac OS>, Win32, VMS, Plan9, S<RISC OS>, VOS, VM/ESA)

=item crypt PLAINTEXT,SALT

May not be available if library or source was not provided when building
perl. (Win32)

Not implemented. (VOS)

=item dbmclose HASH

Not implemented. (VMS, Plan9, VOS)

=item dbmopen HASH,DBNAME,MODE

Not implemented. (VMS, Plan9, VOS)

=item dump LABEL

Not useful. (S<Mac OS>, S<RISC OS>)

Not implemented. (Win32)

Invokes VMS debugger. (VMS)

=item exec LIST

Not implemented. (S<Mac OS>)

Implemented via Spawn. (VM/ESA)

Does not automatically flush output handles on some platforms.
(SunOS, Solaris, HP-UX)

=item exit EXPR

=item exit

Emulates UNIX exit() (which considers C<exit 1> to indicate an error) by
mapping the C<1> to SS$_ABORT (C<44>).  This behavior may be overridden
with the pragma C<use vmsish 'exit'>.  As with the CRTL's exit()
function, C<exit 0> is also mapped to an exit status of SS$_NORMAL
(C<1>); this mapping cannot be overridden.  Any other argument to exit()
is used directly as Perl's exit status. (VMS)

=item fcntl FILEHANDLE,FUNCTION,SCALAR

Not implemented. (Win32, VMS)

=item flock FILEHANDLE,OPERATION

Not implemented (S<Mac OS>, VMS, S<RISC OS>, VOS).

Available only on Windows NT (not on Windows 95). (Win32)

=item fork

Not implemented. (S<Mac OS>, AmigaOS, S<RISC OS>, VOS, VM/ESA)

Emulated using multiple interpreters.  See L<perlfork>.  (Win32)

Does not automatically flush output handles on some platforms.
(SunOS, Solaris, HP-UX)

=item getlogin

Not implemented. (S<Mac OS>, S<RISC OS>)

=item getpgrp PID

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS)

=item getppid

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>)

=item getpriority WHICH,WHO

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS, VM/ESA)

=item getpwnam NAME

Not implemented. (S<Mac OS>, Win32)

Not useful. (S<RISC OS>)

=item getgrnam NAME

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>)

=item getnetbyname NAME

Not implemented. (S<Mac OS>, Win32, Plan9)

=item getpwuid UID

Not implemented. (S<Mac OS>, Win32)

Not useful. (S<RISC OS>)

=item getgrgid GID

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>)

=item getnetbyaddr ADDR,ADDRTYPE

Not implemented. (S<Mac OS>, Win32, Plan9)

=item getprotobynumber NUMBER

Not implemented. (S<Mac OS>)

=item getservbyport PORT,PROTO

Not implemented. (S<Mac OS>)

=item getpwent

Not implemented. (S<Mac OS>, Win32, VM/ESA)

=item getgrent

Not implemented. (S<Mac OS>, Win32, VMS, VM/ESA)

=item gethostent

Not implemented. (S<Mac OS>, Win32)

=item getnetent

Not implemented. (S<Mac OS>, Win32, Plan9)

=item getprotoent

Not implemented. (S<Mac OS>, Win32, Plan9)

=item getservent

Not implemented. (Win32, Plan9)

=item setpwent

Not implemented. (S<Mac OS>, Win32, S<RISC OS>)

=item setgrent

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>)

=item sethostent STAYOPEN

Not implemented. (S<Mac OS>, Win32, Plan9, S<RISC OS>)

=item setnetent STAYOPEN

Not implemented. (S<Mac OS>, Win32, Plan9, S<RISC OS>)

=item setprotoent STAYOPEN

Not implemented. (S<Mac OS>, Win32, Plan9, S<RISC OS>)

=item setservent STAYOPEN

Not implemented. (Plan9, Win32, S<RISC OS>)

=item endpwent

Not implemented. (S<Mac OS>, MPE/iX, VM/ESA, Win32)

=item endgrent

Not implemented. (S<Mac OS>, MPE/iX, S<RISC OS>, VM/ESA, VMS, Win32)

=item endhostent

Not implemented. (S<Mac OS>, Win32)

=item endnetent

Not implemented. (S<Mac OS>, Win32, Plan9)

=item endprotoent

Not implemented. (S<Mac OS>, Win32, Plan9)

=item endservent

Not implemented. (Plan9, Win32)

=item getsockopt SOCKET,LEVEL,OPTNAME

Not implemented. (Plan9)

=item glob EXPR

=item glob

This operator is implemented via the File::Glob extension on most
platforms.  See L<File::Glob> for portability information.

=item ioctl FILEHANDLE,FUNCTION,SCALAR

Not implemented. (VMS)

Available only for socket handles, and it does what the ioctlsocket() call
in the Winsock API does. (Win32)

Available only for socket handles. (S<RISC OS>)

=item kill SIGNAL, LIST

C<kill(0, LIST)> is implemented for the sake of taint checking;
use with other signals is unimplemented. (S<Mac OS>)

Not implemented, hence not useful for taint checking. (S<RISC OS>)

C<kill()> doesn't have the semantics of C<raise()>, i.e. it doesn't send
a signal to the identified process like it does on Unix platforms.
Instead C<kill($sig, $pid)> terminates the process identified by $pid,
and makes it exit immediately with exit status $sig.  As in Unix, if
$sig is 0 and the specified process exists, it returns true without
actually terminating it. (Win32)

=item link OLDFILE,NEWFILE

Not implemented. (S<Mac OS>, MPE/iX, VMS, S<RISC OS>)

Link count not updated because hard links are not quite that hard
(They are sort of half-way between hard and soft links). (AmigaOS)

Hard links are implemented on Win32 (Windows NT and Windows 2000)
under NTFS only.

=item lstat FILEHANDLE

=item lstat EXPR

=item lstat

Not implemented. (VMS, S<RISC OS>)

Return values (especially for device and inode) may be bogus. (Win32)

=item msgctl ID,CMD,ARG

=item msgget KEY,FLAGS

=item msgsnd ID,MSG,FLAGS

=item msgrcv ID,VAR,SIZE,TYPE,FLAGS

Not implemented. (S<Mac OS>, Win32, VMS, Plan9, S<RISC OS>, VOS)

=item open FILEHANDLE,EXPR

=item open FILEHANDLE

The C<|> variants are supported only if ToolServer is installed.
(S<Mac OS>)

open to C<|-> and C<-|> are unsupported. (S<Mac OS>, Win32, S<RISC OS>)

Opening a process does not automatically flush output handles on some
platforms.  (SunOS, Solaris, HP-UX)

=item pipe READHANDLE,WRITEHANDLE

Very limited functionality. (MiNT)

=item readlink EXPR

=item readlink

Not implemented. (Win32, VMS, S<RISC OS>)

=item select RBITS,WBITS,EBITS,TIMEOUT

Only implemented on sockets. (Win32, VMS)

Only reliable on sockets. (S<RISC OS>)

Note that the C<select FILEHANDLE> form is generally portable.

=item semctl ID,SEMNUM,CMD,ARG

=item semget KEY,NSEMS,FLAGS

=item semop KEY,OPSTRING

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS)

=item setgrent

Not implemented. (MPE/iX, Win32)

=item setpgrp PID,PGRP

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS)

=item setpriority WHICH,WHO,PRIORITY

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS)

=item setpwent

Not implemented. (MPE/iX, Win32)

=item setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL

Not implemented. (Plan9)

=item shmctl ID,CMD,ARG

=item shmget KEY,SIZE,FLAGS

=item shmread ID,VAR,POS,SIZE

=item shmwrite ID,STRING,POS,SIZE

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS)

=item sockatmark SOCKET

A relatively recent addition to socket functions, may not
be implemented even in UNIX platforms.

=item socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL

Not implemented. (Win32, VMS, S<RISC OS>, VOS, VM/ESA)

=item stat FILEHANDLE

=item stat EXPR

=item stat

Platforms that do not have rdev, blksize, or blocks will return these
as '', so numeric comparison or manipulation of these fields may cause
'not numeric' warnings.

mtime and atime are the same thing, and ctime is creation time instead of
inode change time. (S<Mac OS>).

ctime not supported on UFS (S<Mac OS X>).

ctime is creation time instead of inode change time  (Win32).

device and inode are not meaningful.  (Win32)

device and inode are not necessarily reliable.  (VMS)

mtime, atime and ctime all return the last modification time.  Device and
inode are not necessarily reliable.  (S<RISC OS>)

dev, rdev, blksize, and blocks are not available.  inode is not
meaningful and will differ between stat calls on the same file.  (os2)

some versions of cygwin when doing a stat("foo") and if not finding it
may then attempt to stat("foo.exe") (Cygwin)

=item symlink OLDFILE,NEWFILE

Not implemented. (Win32, VMS, S<RISC OS>)

=item syscall LIST

Not implemented. (S<Mac OS>, Win32, VMS, S<RISC OS>, VOS, VM/ESA)

=item sysopen FILEHANDLE,FILENAME,MODE,PERMS

The traditional "0", "1", and "2" MODEs are implemented with different
numeric values on some systems.  The flags exported by C<Fcntl>
(O_RDONLY, O_WRONLY, O_RDWR) should work everywhere though.  (S<Mac
OS>, OS/390, VM/ESA)

=item system LIST

In general, do not assume the UNIX/POSIX semantics that you can shift
C<$?> right by eight to get the exit value, or that C<$? & 127>
would give you the number of the signal that terminated the program,
or that C<$? & 128> would test true if the program was terminated by a
coredump.  Instead, use the POSIX W*() interfaces: for example, use
WIFEXITED($?) an WEXITVALUE($?) to test for a normal exit and the exit
value, and WIFSIGNALED($?) and WTERMSIG($?)  for a signal exit and the
signal.  Core dumping is not a portable concept, so there's no portable
way to test for that.

Only implemented if ToolServer is installed. (S<Mac OS>)

As an optimization, may not call the command shell specified in
C<$ENV{PERL5SHELL}>.  C<system(1, @args)> spawns an external
process and immediately returns its process designator, without
waiting for it to terminate.  Return value may be used subsequently
in C<wait> or C<waitpid>.  Failure to spawn() a subprocess is indicated
by setting $? to "255 << 8".  C<$?> is set in a way compatible with
Unix (i.e. the exitstatus of the subprocess is obtained by "$? >> 8",
as described in the documentation).  (Win32)

There is no shell to process metacharacters, and the native standard is
to pass a command line terminated by "\n" "\r" or "\0" to the spawned
program.  Redirection such as C<< > foo >> is performed (if at all) by
the run time library of the spawned program.  C<system> I<list> will call
the Unix emulation library's C<exec> emulation, which attempts to provide
emulation of the stdin, stdout, stderr in force in the parent, providing
the child program uses a compatible version of the emulation library.
I<scalar> will call the native command line direct and no such emulation
of a child Unix program will exists.  Mileage B<will> vary.  (S<RISC OS>)

Far from being POSIX compliant.  Because there may be no underlying
/bin/sh tries to work around the problem by forking and execing the
first token in its argument string.  Handles basic redirection
("<" or ">") on its own behalf. (MiNT)

Does not automatically flush output handles on some platforms.
(SunOS, Solaris, HP-UX)

The return value is POSIX-like (shifted up by 8 bits), which only allows
room for a made-up value derived from the severity bits of the native
32-bit condition code (unless overridden by C<use vmsish 'status'>). 
For more details see L<perlvms/$?>. (VMS)

=item times

Only the first entry returned is nonzero. (S<Mac OS>)

"cumulative" times will be bogus.  On anything other than Windows NT
or Windows 2000, "system" time will be bogus, and "user" time is
actually the time returned by the clock() function in the C runtime
library. (Win32)

Not useful. (S<RISC OS>)

=item truncate FILEHANDLE,LENGTH

=item truncate EXPR,LENGTH

Not implemented. (Older versions of VMS)

Truncation to zero-length only. (VOS)

If a FILEHANDLE is supplied, it must be writable and opened in append
mode (i.e., use C<<< open(FH, '>>filename') >>>
or C<sysopen(FH,...,O_APPEND|O_RDWR)>.  If a filename is supplied, it
should not be held open elsewhere. (Win32)

=item umask EXPR

=item umask

Returns undef where unavailable, as of version 5.005.

C<umask> works but the correct permissions are set only when the file
is finally closed. (AmigaOS)

=item utime LIST

Only the modification time is updated. (S<BeOS>, S<Mac OS>, VMS, S<RISC OS>)

May not behave as expected.  Behavior depends on the C runtime
library's implementation of utime(), and the filesystem being
used.  The FAT filesystem typically does not support an "access
time" field, and it may limit timestamps to a granularity of
two seconds. (Win32)

=item wait

=item waitpid PID,FLAGS

Not implemented. (S<Mac OS>, VOS)

Can only be applied to process handles returned for processes spawned
using C<system(1, ...)> or pseudo processes created with C<fork()>. (Win32)

Not useful. (S<RISC OS>)

=back

=head1 CHANGES

=over 4

=item v1.48, 02 February 2001

Various updates from perl5-porters over the past year, supported
platforms update from Jarkko Hietaniemi.

=item v1.47, 22 March 2000

Various cleanups from Tom Christiansen, including migration of 
long platform listings from L<perl>.

=item v1.46, 12 February 2000

Updates for VOS and MPE/iX. (Peter Prymmer)  Other small changes.

=item v1.45, 20 December 1999

Small changes from 5.005_63 distribution, more changes to EBCDIC info.

=item v1.44, 19 July 1999

A bunch of updates from Peter Prymmer for C<$^O> values,
endianness, File::Spec, VMS, BS2000, OS/400.

=item v1.43, 24 May 1999

Added a lot of cleaning up from Tom Christiansen.

=item v1.42, 22 May 1999

Added notes about tests, sprintf/printf, and epoch offsets.

=item v1.41, 19 May 1999

Lots more little changes to formatting and content.

Added a bunch of C<$^O> and related values
for various platforms; fixed mail and web addresses, and added
and changed miscellaneous notes.  (Peter Prymmer)

=item v1.40, 11 April 1999

Miscellaneous changes.

=item v1.39, 11 February 1999

Changes from Jarkko and EMX URL fixes Michael Schwern.  Additional
note about newlines added.

=item v1.38, 31 December 1998

More changes from Jarkko.

=item v1.37, 19 December 1998

More minor changes.  Merge two separate version 1.35 documents.

=item v1.36, 9 September 1998

Updated for Stratus VOS.  Also known as version 1.35.

=item v1.35, 13 August 1998

Integrate more minor changes, plus addition of new sections under
L<"ISSUES">: L<"Numbers endianness and Width">,
L<"Character sets and character encoding">,
L<"Internationalisation">.

=item v1.33, 06 August 1998

Integrate more minor changes.

=item v1.32, 05 August 1998

Integrate more minor changes.

=item v1.30, 03 August 1998

Major update for RISC OS, other minor changes.

=item v1.23, 10 July 1998

First public release with perl5.005.

=back

=head1 Supported Platforms

As of early 2001 (the Perl releases 5.6.1 and 5.7.1), the following
platforms are able to build Perl from the standard source code
distribution available at http://www.cpan.org/src/index.html

	AIX
	AmigaOS
	Darwin		(Mac OS X)
	DG/UX
	DOS DJGPP 	1)
	DYNIX/ptx
	EPOC
	FreeBSD
	HP-UX
	IRIX
	Linux
	MachTen
	MacOS Classic	2)
	NonStop-UX
	ReliantUNIX	(SINIX)
	OpenBSD
	OpenVMS		(VMS)
	OS/2
	OS X
	QNX
	Solaris
	Tru64 UNIX      (DEC OSF/1, Digital UNIX)
	UNICOS
	UNICOS/mk
	VOS
	Win32/NT/2K	3)

        1) in DOS mode either the DOS or OS/2 ports can be used
        2) Mac OS Classic (pre-X) is almost 5.6.1-ready; building from
	   the source does work with 5.6.1, but additional MacOS specific
           source code is needed for a complete build.  See the web
           site http://dev.macperl.org/ for more information.
        3) compilers: Borland, Cygwin, Mingw32 EGCS/GCC, VC++

The following platforms worked for the previous releases (5.6.0 and 5.7.0),
but we did not manage to test these in time for the 5.7.1 release.
There is a very good chance that these will work fine with the 5.7.1.

	DomainOS
	Hurd
	LynxOS
	MinGW
	MPE/iX
	NetBSD
	PowerMAX
	SCO SV
	SunOS
	SVR4
	Unixware
	Windows 3.1
	Windows 95
	Windows 98
	Windows Me

The following platform worked for the 5.005_03 major release but not
for 5.6.0.  Standardization on UTF-8 as the internal string
representation in 5.6.0 and 5.6.1 introduced incompatibilities in this
EBCDIC platform.  While Perl 5.7.1 will build on this platform some
regression tests may fail and the C<use utf8;> pragma typically
introduces text handling errors.

	OS/390	1)

	1) previously known as MVS, about to become z/OS.

Strongly related to the OS/390 platform by also being EBCDIC-based
mainframe platforms are the following platforms:

	POSIX-BC	(BS2000)
	VM/ESA

These are also expected to work, albeit with no UTF-8 support, under 5.6.1 
for the same reasons as OS/390.  Contact the mailing list perl-mvs@perl.org 
for more details.

The following platforms have been known to build Perl from source in
the past (5.005_03 and earlier), but we haven't been able to verify
their status for the current release, either because the
hardware/software platforms are rare or because we don't have an
active champion on these platforms--or both.  They used to work,
though, so go ahead and try compiling them, and let perlbug@perl.org
of any trouble.

	3b1
	A/UX
	BeOS
	BSD/OS
	ConvexOS
	CX/UX
	DC/OSx
	DDE SMES
	DOS EMX
	Dynix
	EP/IX
	ESIX
	FPS
	GENIX
	Greenhills
	ISC
	MachTen 68k
	MiNT
	MPC
	NEWS-OS
	NextSTEP
	OpenSTEP
	Opus
	Plan 9
	PowerUX
	RISC/os
	SCO ODT/OSR	
	Stellar
	SVR2
	TI1500
	TitanOS
	Ultrix
	Unisys Dynix
	Unixware
	UTS

Support for the following platform is planned for a future Perl release:

	Netware

The following platforms have their own source code distributions and
binaries available via http://www.cpan.org/ports/index.html:

				Perl release

	Netware			5.003_07
	OS/400			5.005_02
	Tandem Guardian		5.004

The following platforms have only binaries available via
http://www.cpan.org/ports/index.html :

				Perl release

	Acorn RISCOS		5.005_02
	AOS			5.002
	LynxOS			5.004_02

Although we do suggest that you always build your own Perl from
the source code, both for maximal configurability and for security,
in case you are in a hurry you can check
http://www.cpan.org/ports/index.html for binary distributions.

=head1 SEE ALSO

L<perlaix>, L<perlapollo>, L<perlamiga>, L<perlbeos>, L<perlbs200>,
L<perlce>, L<perlcygwin>, L<perldgux>, L<perldos>, L<perlepoc>, L<perlebcdic>,
L<perlhurd>, L<perlhpux>, L<perlmachten>, L<perlmacos>, L<perlmint>,
L<perlmpeix>, L<perlnetware>, L<perlos2>, L<perlos390>, L<perlplan9>,
L<perlqnx>, L<perlsolaris>, L<perltru64>, L<perlunicode>,
L<perlvmesa>, L<perlvms>, L<perlvos>, L<perlwin32>, and L<Win32>.

=head1 AUTHORS / CONTRIBUTORS

Abigail <abigail@foad.org>,
Charles Bailey <bailey@newman.upenn.edu>,
Graham Barr <gbarr@pobox.com>,
Tom Christiansen <tchrist@perl.com>,
Nicholas Clark <nick@ccl4.org>,
Thomas Dorner <Thomas.Dorner@start.de>,
Andy Dougherty <doughera@lafayette.edu>,
Dominic Dunlop <domo@computer.org>,
Neale Ferguson <neale@vma.tabnsw.com.au>,
David J. Fiander <davidf@mks.com>,
Paul Green <Paul_Green@stratus.com>,
M.J.T. Guy <mjtg@cam.ac.uk>,
Jarkko Hietaniemi <jhi@iki.fi>,
Luther Huffman <lutherh@stratcom.com>,
Nick Ing-Simmons <nick@ing-simmons.net>,
Andreas J. KE<ouml>nig <a.koenig@mind.de>,
Markus Laker <mlaker@contax.co.uk>,
Andrew M. Langmead <aml@world.std.com>,
Larry Moore <ljmoore@freespace.net>,
Paul Moore <Paul.Moore@uk.origin-it.com>,
Chris Nandor <pudge@pobox.com>,
Matthias Neeracher <neeracher@mac.com>,
Philip Newton <pne@cpan.org>,
Gary Ng <71564.1743@CompuServe.COM>,
Tom Phoenix <rootbeer@teleport.com>,
AndrE<eacute> Pirard <A.Pirard@ulg.ac.be>,
Peter Prymmer <pvhp@forte.com>,
Hugo van der Sanden <hv@crypt0.demon.co.uk>,
Gurusamy Sarathy <gsar@activestate.com>,
Paul J. Schinder <schinder@pobox.com>,
Michael G Schwern <schwern@pobox.com>,
Dan Sugalski <dan@sidhe.org>,
Nathan Torkington <gnat@frii.com>.
o::>, but
the compiler saw no other uses of that namespace before that point.
Perhaps you need to predeclare a package?

=item Can't redefine active sort subroutine %s

(F) Perl optimizes the internal handling of sort subroutines and keeps
pointers into them.  You tried to redefine one su                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlre - Perl regular expressions

=head1 DESCRIPTION

This page describes the syntax of regular expressions in Perl.  For a
description of how to I<use> regular expressions in matching
operations, plus various examples of the same, see discussions
of C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like Operators">.

Matching operations can have various modifiers.  Modifiers
that relate to the interpretation of the regular expression inside
are listed below.  Modifiers that alter the way a regular expression
is used by Perl are detailed in L<perlop/"Regexp Quote-Like Operators"> and 
L<perlop/"Gory details of parsing quoted constructs">.

=over 4

=item i

Do case-insensitive pattern matching.

If C<use locale> is in effect, the case map is taken from the current
locale.  See L<perllocale>.

=item m

Treat string as multiple lines.  That is, change "^" and "$" from matching
the start or end of the string to matching the start or end of any
line anywhere within the string.

=item s

Treat string as single line.  That is, change "." to match any character
whatsoever, even a newline, which normally it would not match.

The C</s> and C</m> modifiers both override the C<$*> setting.  That
is, no matter what C<$*> contains, C</s> without C</m> will force
"^" to match only at the beginning of the string and "$" to match
only at the end (or just before a newline at the end) of the string.
Together, as /ms, they let the "." match any character whatsoever,
while still allowing "^" and "$" to match, respectively, just after
and just before newlines within the string.

=item x

Extend your pattern's legibility by permitting whitespace and comments.

=back

These are usually written as "the C</x> modifier", even though the delimiter
in question might not really be a slash.  Any of these
modifiers may also be embedded within the regular expression itself using
the C<(?...)> construct.  See below.

The C</x> modifier itself needs a little more explanation.  It tells
the regular expression parser to ignore whitespace that is neither
backslashed nor within a character class.  You can use this to break up
your regular expression into (slightly) more readable parts.  The C<#>
character is also treated as a metacharacter introducing a comment,
just as in ordinary Perl code.  This also means that if you want real
whitespace or C<#> characters in the pattern (outside a character
class, where they are unaffected by C</x>), that you'll either have to 
escape them or encode them using octal or hex escapes.  Taken together,
these features go a long way towards making Perl's regular expressions
more readable.  Note that you have to be careful not to include the
pattern delimiter in the comment--perl has no way of knowing you did
not intend to close the pattern early.  See the C-comment deletion code
in L<perlop>.

=head2 Regular Expressions

The patterns used in Perl pattern matching derive from supplied in
the Version 8 regex routines.  (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.)  See L<Version 8 Regular Expressions> for
details.

In particular the following metacharacters have their standard I<egrep>-ish
meanings:

    \	Quote the next metacharacter
    ^	Match the beginning of the line
    .	Match any character (except newline)
    $	Match the end of the line (or before newline at the end)
    |	Alternation
    ()	Grouping
    []	Character class

By default, the "^" character is guaranteed to match only the
beginning of the string, the "$" character only the end (or before the
newline at the end), and Perl does certain optimizations with the
assumption that the string contains only one line.  Embedded newlines
will not be matched by "^" or "$".  You may, however, wish to treat a
string as a multi-line buffer, such that the "^" will match after any
newline within the string, and "$" will match before any newline.  At the
cost of a little more overhead, you can do this by using the /m modifier
on the pattern match operator.  (Older programs did this by setting C<$*>,
but this practice is now deprecated.)

To simplify multi-line substitutions, the "." character never matches a
newline unless you use the C</s> modifier, which in effect tells Perl to pretend
the string is a single line--even if it isn't.  The C</s> modifier also
overrides the setting of C<$*>, in case you have some (badly behaved) older
code that sets it in another module.

The following standard quantifiers are recognized:

    *	   Match 0 or more times
    +	   Match 1 or more times
    ?	   Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times

(If a curly bracket occurs in any other context, it is treated
as a regular character.)  The "*" modifier is equivalent to C<{0,}>, the "+"
modifier to C<{1,}>, and the "?" modifier to C<{0,1}>.  n and m are limited
to integral values less than a preset limit defined when perl is built.
This is usually 32766 on the most common platforms.  The actual limit can
be seen in the error message generated by code such as this:

    $_ **= $_ , / {$_} / for 2 .. 42;

By default, a quantified subpattern is "greedy", that is, it will match as
many times as possible (given a particular starting location) while still
allowing the rest of the pattern to match.  If you want it to match the
minimum number of times possible, follow the quantifier with a "?".  Note
that the meanings don't change, just the "greediness":

    *?	   Match 0 or more times
    +?	   Match 1 or more times
    ??	   Match 0 or 1 time
    {n}?   Match exactly n times
    {n,}?  Match at least n times
    {n,m}? Match at least n but not more than m times

Because patterns are processed as double quoted strings, the following
also work:

    \t		tab                   (HT, TAB)
    \n		newline               (LF, NL)
    \r		return                (CR)
    \f		form feed             (FF)
    \a		alarm (bell)          (BEL)
    \e		escape (think troff)  (ESC)
    \033	octal char (think of a PDP-11)
    \x1B	hex char
    \x{263a}	wide hex char         (Unicode SMILEY)
    \c[		control char
    \N{name}	named char
    \l		lowercase next char (think vi)
    \u		uppercase next char (think vi)
    \L		lowercase till \E (think vi)
    \U		uppercase till \E (think vi)
    \E		end case modification (think vi)
    \Q		quote (disable) pattern metacharacters till \E

If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
and C<\U> is taken from the current locale.  See L<perllocale>.  For
documentation of C<\N{name}>, see L<charnames>.

You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
An unescaped C<$> or C<@> interpolates the corresponding variable,
while escaping will cause the literal string C<\$> to be matched.
You'll need to write something like C<m/\Quser\E\@\Qhost/>.

In addition, Perl defines the following:

    \w	Match a "word" character (alphanumeric plus "_")
    \W	Match a non-"word" character
    \s	Match a whitespace character
    \S	Match a non-whitespace character
    \d	Match a digit character
    \D	Match a non-digit character
    \pP	Match P, named property.  Use \p{Prop} for longer names.
    \PP	Match non-P
    \X	Match eXtended Unicode "combining character sequence",
        equivalent to C<(?:\PM\pM*)>
    \C	Match a single C char (octet) even under utf8.

A C<\w> matches a single alphanumeric character or C<_>, not a whole word.
Use C<\w+> to match a string of Perl-identifier characters (which isn't 
the same as matching an English word).  If C<use locale> is in effect, the
list of alphabetic characters generated by C<\w> is taken from the
current locale.  See L<perllocale>.  You may use C<\w>, C<\W>, C<\s>, C<\S>,
C<\d>, and C<\D> within character classes, but if you try to use them
as endpoints of a range, that's not a range, the "-" is understood literally.
See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.

The POSIX character class syntax

    [:class:]

is also available.  The available classes and their backslash
equivalents (if available) are as follows:

    alpha
    alnum
    ascii
    blank		[1]
    cntrl
    digit       \d
    graph
    lower
    print
    punct
    space       \s	[2]
    upper
    word        \w	[3]
    xdigit

  [1] A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'.
  [2] Not I<exactly equivalent> to C<\s> since the C<[[:space:]]> includes
      also the (very rare) `vertical tabulator', "\ck", chr(11).
  [3] A Perl extension. 

For example use C<[:upper:]> to match all the uppercase characters.
Note that the C<[]> are part of the C<[::]> construct, not part of the
whole character class.  For example:

    [01[:alpha:]%]

matches zero, one, any alphabetic character, and the percentage sign.

If the C<utf8> pragma is used, the following equivalences to Unicode
\p{} constructs and equivalent backslash character classes (if available),
will hold:

    alpha       IsAlpha
    alnum       IsAlnum
    ascii       IsASCII
    blank	IsSpace
    cntrl       IsCntrl
    digit       IsDigit        \d
    graph       IsGraph
    lower       IsLower
    print       IsPrint
    punct       IsPunct
    space       IsSpace
                IsSpacePerl    \s
    upper       IsUpper
    word        IsWord
    xdigit      IsXDigit

For example C<[:lower:]> and C<\p{IsLower}> are equivalent.

If the C<utf8> pragma is not used but the C<locale> pragma is, the
classes correlate with the usual isalpha(3) interface (except for
`word' and `blank').

The assumedly non-obviously named classes are:

=over 4

=item cntrl

Any control character.  Usually characters that don't produce output as
such but instead control the terminal somehow: for example newline and
backspace are control characters.  All characters with ord() less than
32 are most often classified as control characters (assuming ASCII,
the ISO Latin character sets, and Unicode).

=item graph

Any alphanumeric or punctuation (special) character.

=item print

Any alphanumeric or punctuation (special) character or space.

=item punct

Any punctuation (special) character.

=item xdigit

Any hexadecimal digit.  Though this may feel silly ([0-9A-Fa-f] would
work just fine) it is included for completeness.

=back

You can negate the [::] character classes by prefixing the class name
with a '^'. This is a Perl extension.  For example:

    POSIX	trad. Perl  utf8 Perl

    [:^digit:]      \D      \P{IsDigit}
    [:^space:]	    \S	    \P{IsSpace}
    [:^word:]	    \W	    \P{IsWord}

The POSIX character classes [.cc.] and [=cc=] are recognized but
B<not> supported and trying to use them will cause an error.

Perl defines the following zero-width assertions:

    \b	Match a word boundary
    \B	Match a non-(word boundary)
    \A	Match only at beginning of string
    \Z	Match only at end of string, or before newline at the end
    \z	Match only at end of string
    \G	Match only at pos() (e.g. at the end-of-match position
        of prior m//g)

A word boundary (C<\b>) is a spot between two characters
that has a C<\w> on one side of it and a C<\W> on the other side
of it (in either order), counting the imaginary characters off the
beginning and end of the string as matching a C<\W>.  (Within
character classes C<\b> represents backspace rather than a word
boundary, just as it normally does in any double-quoted string.)
The C<\A> and C<\Z> are just like "^" and "$", except that they
won't match multiple times when the C</m> modifier is used, while
"^" and "$" will match at every internal line boundary.  To match
the actual end of the string and not ignore an optional trailing
newline, use C<\z>.

The C<\G> assertion can be used to chain global matches (using
C<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.
It is also useful when writing C<lex>-like scanners, when you have
several patterns that you want to match against consequent substrings
of your string, see the previous reference.  The actual location
where C<\G> will match can also be influenced by using C<pos()> as
an lvalue.  See L<perlfunc/pos>.

The bracketing construct C<( ... )> creates capture buffers.  To
refer to the digit'th buffer use \<digit> within the
match.  Outside the match use "$" instead of "\".  (The
\<digit> notation works in certain circumstances outside 
the match.  See the warning below about \1 vs $1 for details.)
Referring back to another part of the match is called a
I<backreference>.

There is no limit to the number of captured substrings that you may
use.  However Perl also uses \10, \11, etc. as aliases for \010,
\011, etc.  (Recall that 0 means octal, so \011 is the character at
number 9 in your coded character set; which would be the 10th character,
a horizontal tab under ASCII.)  Perl resolves this 
ambiguity by interpreting \10 as a backreference only if at least 10 
left parentheses have opened before it.  Likewise \11 is a 
backreference only if at least 11 left parentheses have opened 
before it.  And so on.  \1 through \9 are always interpreted as 
backreferences.

Examples:

    s/^([^ ]*) *([^ ]*)/$2 $1/;     # swap first two words

     if (/(.)\1/) {                 # find first doubled char
         print "'$1' is the first doubled character\n";
     }

    if (/Time: (..):(..):(..)/) {   # parse out values
	$hours = $1;
	$minutes = $2;
	$seconds = $3;
    }

Several special variables also refer back to portions of the previous
match.  C<$+> returns whatever the last bracket match matched.
C<$&> returns the entire matched string.  (At one point C<$0> did
also, but now it returns the name of the program.)  C<$`> returns
everything before the matched string.  And C<$'> returns everything
after the matched string.

The numbered variables ($1, $2, $3, etc.) and the related punctuation
set (C<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped
until the end of the enclosing block or until the next successful
match, whichever comes first.  (See L<perlsyn/"Compound Statements">.)

B<WARNING>: Once Perl sees that you need one of C<$&>, C<$`>, or
C<$'> anywhere in the program, it has to provide them for every
pattern match.  This may substantially slow your program.  Perl
uses the same mechanism to produce $1, $2, etc, so you also pay a
price for each pattern that contains capturing parentheses.  (To
avoid this cost while retaining the grouping behaviour, use the
extended regular expression C<(?: ... )> instead.)  But if you never
use C<$&>, C<$`> or C<$'>, then patterns I<without> capturing
parentheses will not be penalized.  So avoid C<$&>, C<$'>, and C<$`>
if you can, but if you can't (and some algorithms really appreciate
them), once you've used them once, use them at will, because you've
already paid the price.  As of 5.005, C<$&> is not so costly as the
other two.

Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
C<\w>, C<\n>.  Unlike some other regular expression languages, there
are no backslashed symbols that aren't alphanumeric.  So anything
that looks like \\, \(, \), \<, \>, \{, or \} is always
interpreted as a literal character, not a metacharacter.  This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
use for a pattern. Simply quote all non-"word" characters:

    $pattern =~ s/(\W)/\\$1/g;

(If C<use locale> is set, then this depends on the current locale.)
Today it is more common to use the quotemeta() function or the C<\Q>
metaquoting escape sequence to disable all metacharacters' special
meanings like this:

    /$unquoted\Q$quoted\E$unquoted/

Beware that if you put literal backslashes (those not inside
interpolated variables) between C<\Q> and C<\E>, double-quotish
backslash interpolation may lead to confusing results.  If you
I<need> to use literal backslashes within C<\Q...\E>,
consult L<perlop/"Gory details of parsing quoted constructs">.

=head2 Extended Patterns

Perl also defines a consistent extension syntax for features not
found in standard tools like B<awk> and B<lex>.  The syntax is a
pair of parentheses with a question mark as the first thing within
the parentheses.  The character after the question mark indicates
the extension.

The stability of these extensions varies widely.  Some have been
part of the core language for many years.  Others are experimental
and may change without warning or be completely removed.  Check
the documentation on an individual feature to verify its current
status.

A question mark was chosen for this and for the minimal-matching
construct because 1) question marks are rare in older regular
expressions, and 2) whenever you see one, you should stop and
"question" exactly what is going on.  That's psychology...

=over 10

=item C<(?#text)>

A comment.  The text is ignored.  If the C</x> modifier enables
whitespace formatting, a simple C<#> will suffice.  Note that Perl closes
the comment as soon as it sees a C<)>, so there is no way to put a literal
C<)> in the comment.

=item C<(?imsx-imsx)>

One or more embedded pattern-match modifiers.  This is particularly
useful for dynamic patterns, such as those read in from a configuration
file, read in as an argument, are specified in a table somewhere,
etc.  Consider the case that some of which want to be case sensitive
and some do not.  The case insensitive ones need to include merely
C<(?i)> at the front of the pattern.  For example:

    $pattern = "foobar";
    if ( /$pattern/i ) { } 

    # more flexible:

    $pattern = "(?i)foobar";
    if ( /$pattern/ ) { } 

Letters after a C<-> turn those modifiers off.  These modifiers are
localized inside an enclosing group (if any).  For example,

    ( (?i) blah ) \s+ \1

will match a repeated (I<including the case>!) word C<blah> in any
case, assuming C<x> modifier, and no C<i> modifier outside this
group.

=item C<(?:pattern)>

=item C<(?imsx-imsx:pattern)>

This is for clustering, not capturing; it groups subexpressions like
"()", but doesn't make backreferences as "()" does.  So

    @fields = split(/\b(?:a|b|c)\b/)

is like

    @fields = split(/\b(a|b|c)\b/)

but doesn't spit out extra fields.  It's also cheaper not to capture
characters if you don't need to.

Any letters between C<?> and C<:> act as flags modifiers as with
C<(?imsx-imsx)>.  For example, 

    /(?s-i:more.*than).*million/i

is equivalent to the more verbose

    /(?:(?s-i)more.*than).*million/i

=item C<(?=pattern)>

A zero-width positive look-ahead assertion.  For example, C</\w+(?=\t)/>
matches a word followed by a tab, without including the tab in C<$&>.

=item C<(?!pattern)>

A zero-width negative look-ahead assertion.  For example C</foo(?!bar)/>
matches any occurrence of "foo" that isn't followed by "bar".  Note
however that look-ahead and look-behind are NOT the same thing.  You cannot
use this for look-behind.

If you are looking for a "bar" that isn't preceded by a "foo", C</(?!foo)bar/>
will not do what you want.  That's because the C<(?!foo)> is just saying that
the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
match.  You would have to do something like C</(?!foo)...bar/> for that.   We
say "like" because there's the case of your "bar" not having three characters
before it.  You could cover that this way: C</(?:(?!foo)...|^.{0,2})bar/>.
Sometimes it's still easier just to say:

    if (/bar/ && $` !~ /foo$/)

For look-behind see below.

=item C<(?<=pattern)>

A zero-width positive look-behind assertion.  For example, C</(?<=\t)\w+/>
matches a word that follows a tab, without including the tab in C<$&>.
Works only for fixed-width look-behind.

=item C<(?<!pattern)>

A zero-width negative look-behind assertion.  For example C</(?<!bar)foo/>
matches any occurrence of "foo" that does not follow "bar".  Works
only for fixed-width look-behind.

=item C<(?{ code })>

B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.

This zero-width assertion evaluate any embedded Perl code.  It
always succeeds, and its C<code> is not interpolated.  Currently,
the rules to determine where the C<code> ends are somewhat convoluted.

The C<code> is properly scoped in the following sense: If the assertion
is backtracked (compare L<"Backtracking">), all changes introduced after
C<local>ization are undone, so that

  $_ = 'a' x 8;
  m< 
     (?{ $cnt = 0 })			# Initialize $cnt.
     (
       a 
       (?{
           local $cnt = $cnt + 1;	# Update $cnt, backtracking-safe.
       })
     )*  
     aaaa
     (?{ $res = $cnt })			# On success copy to non-localized
					# location.
   >x;

will set C<$res = 4>.  Note that after the match, $cnt returns to the globally
introduced value, because the scopes that restrict C<local> operators
are unwound.

This assertion may be used as a C<(?(condition)yes-pattern|no-pattern)>
switch.  If I<not> used in this way, the result of evaluation of
C<code> is put into the special variable C<$^R>.  This happens
immediately, so C<$^R> can be used from other C<(?{ code })> assertions
inside the same regular expression.

The assignment to C<$^R> above is properly localized, so the old
value of C<$^R> is restored if the assertion is backtracked; compare
L<"Backtracking">.

For reasons of security, this construct is forbidden if the regular
expression involves run-time interpolation of variables, unless the
perilous C<use re 'eval'> pragma has been used (see L<re>), or the
variables contain results of C<qr//> operator (see
L<perlop/"qr/STRING/imosx">).  

This restriction is because of the wide-spread and remarkably convenient
custom of using run-time determined strings as patterns.  For example:

    $re = <>;
    chomp $re;
    $string =~ /$re/;

Before Perl knew how to execute interpolated code within a pattern,
this operation was completely safe from a security point of view,
although it could raise an exception from an illegal pattern.  If
you turn on the C<use re 'eval'>, though, it is no longer secure,
so you should only do so if you are also using taint checking.
Better yet, use the carefully constrained evaluation within a Safe
module.  See L<perlsec> for details about both these mechanisms.

=item C<(??{ code })>

B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
A simplified version of the syntax may be introduced for commonly
used idioms.

This is a "postponed" regular subexpression.  The C<code> is evaluated
at run time, at the moment this subexpression may match.  The result
of evaluation is considered as a regular expression and matched as
if it were inserted instead of this construct.

The C<code> is not interpolated.  As before, the rules to determine
where the C<code> ends are currently somewhat convoluted.

The following pattern matches a parenthesized group:

  $re = qr{
	     \(
	     (?:
		(?> [^()]+ )	# Non-parens without backtracking
	      |
		(??{ $re })	# Group with matching parens
	     )*
	     \)
	  }x;

=item C<< (?>pattern) >>

B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.

An "independent" subexpression, one which matches the substring
that a I<standalone> C<pattern> would match if anchored at the given
position, and it matches I<nothing other than this substring>.  This
construct is useful for optimizations of what would otherwise be
"eternal" matches, because it will not backtrack (see L<"Backtracking">).
It may also be useful in places where the "grab all you can, and do not
give anything back" semantic is desirable.

For example: C<< ^(?>a*)ab >> will never match, since C<< (?>a*) >>
(anchored at the beginning of string, as above) will match I<all>
characters C<a> at the beginning of string, leaving no C<a> for
C<ab> to match.  In contrast, C<a*ab> will match the same as C<a+b>,
since the match of the subgroup C<a*> is influenced by the following
group C<ab> (see L<"Backtracking">).  In particular, C<a*> inside
C<a*ab> will match fewer characters than a standalone C<a*>, since
this makes the tail match.

An effect similar to C<< (?>pattern) >> may be achieved by writing
C<(?=(pattern))\1>.  This matches the same substring as a standalone
C<a+>, and the following C<\1> eats the matched string; it therefore
makes a zero-length assertion into an analogue of C<< (?>...) >>.
(The difference between these two constructs is that the second one
uses a capturing group, thus shifting ordinals of backreferences
in the rest of a regular expression.)

Consider this pattern:

    m{ \(
	  ( 
	    [^()]+		# x+
          | 
            \( [^()]* \)
          )+
       \) 
     }x

That will efficiently match a nonempty group with matching parentheses
two levels deep or less.  However, if there is no such group, it
will take virtually forever on a long string.  That's because there
are so many different ways to split a long string into several
substrings.  This is what C<(.+)+> is doing, and C<(.+)+> is similar
to a subpattern of the above pattern.  Consider how the pattern
above detects no-match on C<((()aaaaaaaaaaaaaaaaaa> in several
seconds, but that each extra letter doubles this time.  This
exponential performance will make it appear that your program has
hung.  However, a tiny change to this pattern

    m{ \( 
	  ( 
	    (?> [^()]+ )	# change x+ above to (?> x+ )
          | 
            \( [^()]* \)
          )+
       \) 
     }x

which uses C<< (?>...) >> matches exactly when the one above does (verifying
this yourself would be a productive exercise), but finishes in a fourth
the time when used on a similar string with 1000000 C<a>s.  Be aware,
however, that this pattern currently triggers a warning message under
the C<use warnings> pragma or B<-w> switch saying it
C<"matches the null string many times">):

On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable
effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>.
This was only 4 times slower on a string with 1000000 C<a>s.

The "grab all you can, and do not give anything back" semantic is desirable
in many situations where on the first sight a simple C<()*> looks like
the correct solution.  Suppose we parse text with comments being delimited
by C<#> followed by some optional (horizontal) whitespace.  Contrary to
its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
the comment delimiter, because it may "give up" some whitespace if
the remainder of the pattern can be made to match that way.  The correct
answer is either one of these:

    (?>#[ \t]*)
    #[ \t]*(?![ \t])

For example, to grab non-empty comments into $1, one should use either
one of these:

    / (?> \# [ \t]* ) (        .+ ) /x;
    /     \# [ \t]*   ( [^ \t] .* ) /x;

Which one you pick depends on which of these expressions better reflects
the above specification of comments.

=item C<(?(condition)yes-pattern|no-pattern)>

=item C<(?(condition)yes-pattern)>

B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.

Conditional expression.  C<(condition)> should be either an integer in
parentheses (which is valid if the corresponding pair of parentheses
matched), or look-ahead/look-behind/evaluate zero-width assertion.

For example:

    m{ ( \( )? 
       [^()]+ 
       (?(1) \) ) 
     }x

matches a chunk of non-parentheses, possibly included in parentheses
themselves.

=back

=head2 Backtracking

NOTE: This section presents an abstract approximation of regular
expression behavior.  For a more rigorous (and complicated) view of
the rules involved in selecting a match among possible alternatives,
see L<Combining pieces together>.

A fundamental feature of regular expression matching involves the
notion called I<backtracking>, which is currently used (when needed)
by all regular expression quantifiers, namely C<*>, C<*?>, C<+>,
C<+?>, C<{n,m}>, and C<{n,m}?>.  Backtracking is often optimized
internally, but the general principle outlined here is valid.

For a regular expression to match, the I<entire> regular expression must
match, not just part of it.  So if the beginning of a pattern containing a
quantifier succeeds in a way that causes later parts in the pattern to
fail, the matching engine backs up and recalculates the beginning
part--that's why it's called backtracking.

Here is an example of backtracking:  Let's say you want to find the
word following "foo" in the string "Food is on the foo table.":

    $_ = "Food is on the foo table.";
    if ( /\b(foo)\s+(\w+)/i ) {
	print "$2 follows $1.\n";
    }

When the match runs, the first part of the regular expression (C<\b(foo)>)
finds a possible match right at the beginning of the string, and loads up
$1 with "Foo".  However, as soon as the matching engine sees that there's
no whitespace following the "Foo" that it had saved in $1, it realizes its
mistake and starts over again one character after where it had the
tentative match.  This time it goes all the way until the next occurrence
of "foo". The complete regular expression matches this time, and you get
the expected output of "table follows foo."

Sometimes minimal matching can help a lot.  Imagine you'd like to match
everything between "foo" and "bar".  Initially, you write something
like this:

    $_ =  "The food is under the bar in the barn.";
    if ( /foo(.*)bar/ ) {
	print "got <$1>\n";
    }

Which perhaps unexpectedly yields:

  got <d is under the bar in the >

That's because C<.*> was greedy, so you get everything between the
I<first> "foo" and the I<last> "bar".  Here it's more effective
to use minimal matching to make sure you get the text between a "foo"
and the first "bar" thereafter.

    if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
  got <d is under the >

Here's another example: let's say you'd like to match a number at the end
of a string, and you also want to keep the preceding part the match.
So you write this:

    $_ = "I have 2 numbers: 53147";
    if ( /(.*)(\d*)/ ) {				# Wrong!
	print "Beginning is <$1>, number is <$2>.\n";
    }

That won't work at all, because C<.*> was greedy and gobbled up the
whole string. As C<\d*> can match on an empty string the complete
regular expression matched successfully.

    Beginning is <I have 2 numbers: 53147>, number is <>.

Here are some variants, most of which don't work:

    $_ = "I have 2 numbers: 53147";
    @pats = qw{
	(.*)(\d*)
	(.*)(\d+)
	(.*?)(\d*)
	(.*?)(\d+)
	(.*)(\d+)$
	(.*?)(\d+)$
	(.*)\b(\d+)$
	(.*\D)(\d+)$
    };

    for $pat (@pats) {
	printf "%-12s ", $pat;
	if ( /$pat/ ) {
	    print "<$1> <$2>\n";
	} else {
	    print "FAIL\n";
	}
    }

That will print out:

    (.*)(\d*)    <I have 2 numbers: 53147> <>
    (.*)(\d+)    <I have 2 numbers: 5314> <7>
    (.*?)(\d*)   <> <>
    (.*?)(\d+)   <I have > <2>
    (.*)(\d+)$   <I have 2 numbers: 5314> <7>
    (.*?)(\d+)$  <I have 2 numbers: > <53147>
    (.*)\b(\d+)$ <I have 2 numbers: > <53147>
    (.*\D)(\d+)$ <I have 2 numbers: > <53147>

As you see, this can be a bit tricky.  It's important to realize that a
regular expression is merely a set of assertions that gives a definition
of success.  There may be 0, 1, or several different ways that the
definition might succeed against a particular string.  And if there are
multiple ways it might succeed, you need to understand backtracking to
know which variety of success you will achieve.

When using look-ahead assertions and negations, this can all get even
tricker.  Imagine you'd like to find a sequence of non-digits not
followed by "123".  You might try to write that as

    $_ = "ABC123";
    if ( /^\D*(?!123)/ ) {		# Wrong!
	print "Yup, no 123 in $_\n";
    }

But that isn't going to match; at least, not the way you're hoping.  It
claims that there is no 123 in the string.  Here's a clearer picture of
why it that pattern matches, contrary to popular expectations:

    $x = 'ABC123' ;
    $y = 'ABC445' ;

    print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ;
    print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ;

    print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ;
    print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ;

This prints

    2: got ABC
    3: got AB
    4: got ABC

You might have expected test 3 to fail because it seems to a more
general purpose version of test 1.  The important difference between
them is that test 3 contains a quantifier (C<\D*>) and so can use
backtracking, whereas test 1 will not.  What's happening is
that you've asked "Is it true that at the start of $x, following 0 or more
non-digits, you have something that's not 123?"  If the pattern matcher had
let C<\D*> expand to "ABC", this would have caused the whole pattern to
fail.

The search engine will initially match C<\D*> with "ABC".  Then it will
try to match C<(?!123> with "123", which fails.  But because
a quantifier (C<\D*>) has been used in the regular expression, the
search engine can backtrack and retry the match differently
in the hope of matching the complete regular expression.

The pattern really, I<really> wants to succeed, so it uses the
standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this
time.  Now there's indeed something following "AB" that is not
"123".  It's "C123", which suffices.

We can deal with this by using both an assertion and a negation.
We'll say that the first part in $1 must be followed both by a digit
and by something that's not "123".  Remember that the look-aheads
are zero-width expressions--they only look, but don't consume any
of the string in their match.  So rewriting this way produces what
you'd expect; that is, case 5 will fail, but case 6 succeeds:

    print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ;
    print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ;

    6: got ABC

In other words, the two zero-width assertions next to each other work as though
they're ANDed together, just as you'd use any built-in assertions:  C</^$/>
matches only if you're at the beginning of the line AND the end of the
line simultaneously.  The deeper underlying truth is that juxtaposition in
regular expressions always means AND, except when you write an explicit OR
using the vertical bar.  C</ab/> means match "a" AND (then) match "b",
although the attempted matches are made at different positions because "a"
is not a zero-width assertion, but a one-width assertion.

B<WARNING>: particularly complicated regular expressions can take
exponential time to solve because of the immense number of possible
ways they can use backtracking to try match.  For example, without
internal optimizations done by the regular expression engine, this will
take a painfully long time to run:

    'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/

And if you used C<*>'s in the internal groups instead of limiting them
to 0 through 5 matches, then it would take forever--or until you ran
out of stack space.  Moreover, these internal optimizations are not
always applicable.  For example, if you put C<{0,5}> instead of C<*>
on the external group, no current optimization is applicable, and the
match takes a long time to finish.

A powerful tool for optimizing such beasts is what is known as an
"independent group",
which does not backtrack (see L<C<< (?>pattern) >>>).  Note also that
zero-length look-ahead/look-behind assertions will not backtrack to make
the tail match, since they are in "logical" context: only 
whether they match is considered relevant.  For an example
where side-effects of look-ahead I<might> have influenced the
following match, see L<C<< (?>pattern) >>>.

=head2 Version 8 Regular Expressions

In case you're not familiar with the "regular" Version 8 regex
routines, here are the pattern-matching rules not described above.

Any single character matches itself, unless it is a I<metacharacter>
with a special meaning described here or above.  You can cause
characters that normally function as metacharacters to be interpreted
literally by prefixing them with a "\" (e.g., "\." matches a ".", not any
character; "\\" matches a "\").  A series of characters matches that
series of characters in the target string, so the pattern C<blurfl>
would match "blurfl" in the target string.

You can specify a character class, by enclosing a list of characters
in C<[]>, which will match any one character from the list.  If the
first character after the "[" is "^", the class matches any character not
in the list.  Within a list, the "-" character specifies a
range, so that C<a-z> represents all characters between "a" and "z",
inclusive.  If you want either "-" or "]" itself to be a member of a
class, put it at the start of the list (possibly after a "^"), or
escape it with a backslash.  "-" is also taken literally when it is
at the end of the list, just before the closing "]".  (The
following all specify the same class of three characters: C<[-az]>,
C<[az-]>, and C<[a\-z]>.  All are different from C<[a-z]>, which
specifies a class containing twenty-six characters, even on EBCDIC
based coded character sets.)  Also, if you try to use the character 
classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of 
a range, that's not a range, the "-" is understood literally.

Note also that the whole range idea is rather unportable between
character sets--and even within character sets they may cause results
you probably didn't expect.  A sound principle is to use only ranges
that begin from and end at either alphabets of equal case ([a-e],
[A-E]), or digits ([0-9]).  Anything else is unsafe.  If in doubt,
spell out the character sets in full.

Characters may be specified using a metacharacter syntax much like that
used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
"\f" a form feed, etc.  More generally, \I<nnn>, where I<nnn> is a string
of octal digits, matches the character whose coded character set value 
is I<nnn>.  Similarly, \xI<nn>, where I<nn> are hexadecimal digits, 
matches the character whose numeric value is I<nn>. The expression \cI<x> 
matches the character control-I<x>.  Finally, the "." metacharacter 
matches any character except "\n" (unless you use C</s>).

You can specify a series of alternatives for a pattern using "|" to
separate them, so that C<fee|fie|foe> will match any of "fee", "fie",
or "foe" in the target string (as would C<f(e|i|o)e>).  The
first alternative includes everything from the last pattern delimiter
("(", "[", or the beginning of the pattern) up to the first "|", and
the last alternative contains everything from the last "|" to the next
pattern delimiter.  That's why it's common practice to include
alternatives in parentheses: to minimize confusion about where they
start and end.

Alternatives are tried from left to right, so the first
alternative found for which the entire expression matches, is the one that
is chosen. This means that alternatives are not necessarily greedy. For
example: when matching C<foo|foot> against "barefoot", only the "foo"
part will match, as that is the first alternative tried, and it successfully
matches the target string. (This might not seem important, but it is
important when you are capturing matched text using parentheses.)

Also remember that "|" is interpreted as a literal within square brackets,
so if you write C<[fee|fie|foe]> you're really only matching C<[feio|]>.

Within a pattern, you may designate subpatterns for later reference
by enclosing them in parentheses, and you may refer back to the
I<n>th subpattern later in the pattern using the metacharacter
\I<n>.  Subpatterns are numbered based on the left to right order
of their opening parenthesis.  A backreference matches whatever
actually matched the subpattern in the string being examined, not
the rules for that subpattern.  Therefore, C<(0|0x)\d*\s\1\d*> will
match "0x1234 0x4321", but not "0x1234 01234", because subpattern
1 matched "0x", even though the rule C<0|0x> could potentially match
the leading 0 in the second number.

=head2 Warning on \1 vs $1

Some people get too used to writing things like:

    $pattern =~ s/(\W)/\\\1/g;

This is grandfathered for the RHS of a substitute to avoid shocking the
B<sed> addicts, but it's a dirty habit to get into.  That's because in
PerlThink, the righthand side of a C<s///> is a double-quoted string.  C<\1> in
the usual double-quoted string means a control-A.  The customary Unix
meaning of C<\1> is kludged in for C<s///>.  However, if you get into the habit
of doing that, you get yourself into trouble if you then add an C</e>
modifier.

    s/(\d+)/ \1 + 1 /eg;    	# causes warning under -w

Or if you try to do

    s/(\d+)/\1000/;

You can't disambiguate that by saying C<\{1}000>, whereas you can fix it with
C<${1}000>.  The operation of interpolation should not be confused
with the operation of matching a backreference.  Certainly they mean two
different things on the I<left> side of the C<s///>.

=head2 Repeated patterns matching zero-length substring

B<WARNING>: Difficult material (and prose) ahead.  This section needs a rewrite.

Regular expressions provide a terse and powerful programming language.  As
with most other power tools, power comes together with the ability
to wreak havoc.

A common abuse of this power stems from the ability to make infinite
loops using regular expressions, with something as innocuous as:

    'foo' =~ m{ ( o? )* }x;

The C<o?> can match at the beginning of C<'foo'>, and since the position
in the string is not moved by the match, C<o?> would match again and again
because of the C<*> modifier.  Another common way to create a similar cycle
is with the looping modifier C<//g>:

    @matches = ( 'foo' =~ m{ o? }xg );

or

    print "match: <$&>\n" while 'foo' =~ m{ o? }xg;

or the loop implied by split().

However, long experience has shown that many programming tasks may
be significantly simplified by using repeated subexpressions that
may match zero-length substrings.  Here's a simple example being:

    @chars = split //, $string;		  # // is not magic in split
    ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /

Thus Perl allows such constructs, by I<forcefully breaking
the infinite loop>.  The rules for this are different for lower-level
loops given by the greedy modifiers C<*+{}>, and for higher-level
ones like the C</g> modifier or split() operator.

The lower-level loops are I<interrupted> (that is, the loop is
broken) when Perl detects that a repeated expression matched a
zero-length substring.   Thus

   m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;

is made equivalent to 

   m{   (?: NON_ZERO_LENGTH )* 
      | 
        (?: ZERO_LENGTH )? 
    }x;

The higher level-loops preserve an additional state between iterations:
whether the last match was zero-length.  To break the loop, the following 
match after a zero-length match is prohibited to have a length of zero.
This prohibition interacts with backtracking (see L<"Backtracking">), 
and so the I<second best> match is chosen if the I<best> match is of
zero length.

For example:

    $_ = 'bar';
    s/\w??/<$&>/g;

results in C<< <><b><><a><><r><> >>.  At each position of the string the best
match given by non-greedy C<??> is the zero-length match, and the I<second 
best> match is what is matched by C<\w>.  Thus zero-length matches
alternate with one-character-long matches.

Similarly, for repeated C<m/()/g> the second-best match is the match at the 
position one notch further in the string.

The additional state of being I<matched with zero-length> is associated with
the matched string, and is reset by each assignment to pos().
Zero-length matches at the end of the previous match are ignored
during C<split>.

=head2 Combining pieces together

Each of the elementary pieces of regular expressions which were described
before (such as C<ab> or C<\Z>) could match at most one substring
at the given position of the input string.  However, in a typical regular
expression these elementary pieces are combined into more complicated
patterns using combining operators C<ST>, C<S|T>, C<S*> etc
(in these examples C<S> and C<T> are regular subexpressions).

Such combinations can include alternatives, leading to a problem of choice:
if we match a regular expression C<a|ab> against C<"abc">, will it match
substring C<"a"> or C<"ab">?  One way to describe which substring is
actually matched is the concept of backtracking (see L<"Backtracking">).
However, this description is too low-level and makes you think
in terms of a particular implementation.

Another description starts with notions of "better"/"worse".  All the
substrings which may be matched by the given regular expression can be
sorted from the "best" match to the "worst" match, and it is the "best"
match which is chosen.  This substitutes the question of "what is chosen?"
by the question of "which matches are better, and which are worse?".

Again, for elementary pieces there is no such question, since at most
one match at a given position is possible.  This section describes the
notion of better/worse for combining operators.  In the description
below C<S> and C<T> are regular subexpressions.

=over 4

=item C<ST>

Consider two possible matches, C<AB> and C<A'B'>, C<A> and C<A'> are
substrings which can be matched by C<S>, C<B> and C<B'> are substrings
which can be matched by C<T>. 

If C<A> is better match for C<S> than C<A'>, C<AB> is a better
match than C<A'B'>.

If C<A> and C<A'> coincide: C<AB> is a better match than C<AB'> if
C<B> is better match for C<T> than C<B'>.

=item C<S|T>

When C<S> can match, it is a better match than when only C<T> can match.

Ordering of two matches for C<S> is the same as for C<S>.  Similar for
two matches for C<T>.

=item C<S{REPEAT_COUNT}>

Matches as C<SSS...S> (repeated as many times as necessary).

=item C<S{min,max}>

Matches as C<S{max}|S{max-1}|...|S{min+1}|S{min}>.

=item C<S{min,max}?>

Matches as C<S{min}|S{min+1}|...|S{max-1}|S{max}>.

=item C<S?>, C<S*>, C<S+>

Same as C<S{0,1}>, C<S{0,BIG_NUMBER}>, C<S{1,BIG_NUMBER}> respectively.

=item C<S??>, C<S*?>, C<S+?>

Same as C<S{0,1}?>, C<S{0,BIG_NUMBER}?>, C<S{1,BIG_NUMBER}?> respectively.

=item C<< (?>S) >>

Matches the best match for C<S> and only that.

=item C<(?=S)>, C<(?<=S)>

Only the best match for C<S> is considered.  (This is important only if
C<S> has capturing parentheses, and backreferences are used somewhere
else in the whole regular expression.)

=item C<(?!S)>, C<(?<!S)>

For this grouping operator there is no need to describe the ordering, since
only whether or not C<S> can match is important.

=item C<(??{ EXPR })>

The ordering is the same as for the regular expression which is
the result of EXPR.

=item C<(?(condition)yes-pattern|no-pattern)>

Recall that which of C<yes-pattern> or C<no-pattern> actually matches is
already determined.  The ordering of the matches is the same as for the
chosen subexpression.

=back

The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.

=head2 Creating custom RE engines

Overloaded constants (see L<overload>) provide a simple way to extend
the functionality of the RE engine.

Suppose that we want to enable a new RE escape-sequence C<\Y|> which
matches at boundary between white-space characters and non-whitespace
characters.  Note that C<(?=\S)(?<!\S)|(?!\S)(?<=\S)> matches exactly
at these positions, so we want to have each C<\Y|> in the place of the
more complicated version.  We can create a module C<customre> to do
this:

    package customre;
    use overload;

    sub import {
      shift;
      die "No argument to customre::import allowed" if @_;
      overload::constant 'qr' => \&convert;
    }

    sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}

    my %rules = ( '\\' => '\\', 
		  'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
    sub convert {
      my $re = shift;
      $re =~ s{ 
                \\ ( \\ | Y . )
              }
              { $rules{$1} or invalid($re,$1) }sgex; 
      return $re;
    }

Now C<use customre> enables the new escape in constant regular
expressions, i.e., those without any runtime variable interpolations.
As documented in L<overload>, this conversion will work only over
literal parts of regular expressions.  For C<\Y|$re\Y|> the variable
part of this regular expression needs to be converted explicitly
(but only if the special meaning of C<\Y|> should be enabled inside $re):

    use customre;
    $re = <>;
    chomp $re;
    $re = customre::convert $re;
    /\Y|$re\Y|/;

=head1 BUGS

This document varies from difficult to understand to completely
and utterly opaque.  The wandering prose riddled with jargon is
hard to fathom in several places.

This document needs a rewrite that separates the tutorial content
from the reference content.

=head1 SEE ALSO

L<perlop/"Regexp Quote-Like Operators">.

L<perlop/"Gory details of parsing quoted constructs">.

L<perlfaq6>.

L<perlfunc/pos>.

L<perllocale>.

L<perlebcdic>.

I<Mastering Regular Expressions> by Jeffrey Friedl, published
by O'Reilly and Associates.
   p e r l a p i . p o d t a . p o d   C h e c k ) d   M e                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     =head1 NAME

perlref - Perl references and nested data structures

=head1 NOTE

This is complete documentation about all aspects of references.
For a shorter, tutorial introduction to just the essential features,
see L<perlreftut>.

=head1 DESCRIPTION

Before release 5 of Perl it was difficult to represent complex data
structures, because all references had to be symbolic--and even then
it was difficult to refer to a variable instead of a symbol table entry.
Perl now not only makes it easier to use symbolic references to variables,
but also lets you have "hard" references to any piece of data or code.
Any scalar may hold a hard reference.  Because arrays and hashes contain
scalars, you can now easily build arrays of arrays, arrays of hashes,
hashes of arrays, arrays of hashes of functions, and so on.

Hard references are smart--they keep track of reference counts for you,
automatically freeing the thing referred to when its reference count goes
to zero.  (Reference counts for values in self-referential or
cyclic data structures may not go to zero without a little help; see
L<perlobj/"Two-Phased Garbage Collection"> for a detailed explanation.)
If that thing happens to be an object, the object is destructed.  See
L<perlobj> for more about objects.  (In a sense, everything in Perl is an
object, but we usually reserve the word for references to objects that
have been officially "blessed" into a class package.)

Symbolic references are names of variables or other objects, just as a
symbolic link in a Unix filesystem contains merely the name of a file.
The C<*glob> notation is something of a of symbolic reference.  (Symbolic
references are sometimes called "soft references", but please don't call
them that; references are confusing enough without useless synonyms.)

In contrast, hard references are more like hard links in a Unix file
system: They are used to access an underlying object without concern for
what its (other) name is.  When the word "reference" is used without an
adjective, as in the following paragraph, it is usually talking about a
hard reference.

References are easy to use in Perl.  There is just one overriding
principle: Perl does no implicit referencing or dereferencing.  When a
scalar is holding a reference, it always behaves as a simple scalar.  It
doesn't magically start being an array or hash or subroutine; you have to
tell it explicitly to do so, by dereferencing it.

=head2 Making References

References can be created in several ways.

=over 4

=item 1.

By using the backslash operator on a variable, subroutine, or value.
(This works much like the & (address-of) operator in C.)  
This typically creates I<another> reference to a variable, because
there's already a reference to the variable in the symbol table.  But
the symbol table reference might go away, and you'll still have the
reference that the backslash returned.  Here are some examples:

    $scalarref = \$foo;
    $arrayref  = \@ARGV;
    $hashref   = \%ENV;
    $coderef   = \&handler;
    $globref   = \*foo;

It isn't possible to create a true reference to an IO handle (filehandle
or dirhandle) using the backslash operator.  The most you can get is a
reference to a typeglob, which is actually a complete symbol table entry.
But see the explanation of the C<*foo{THING}> syntax below.  However,
you can still use type globs and globrefs as though they were IO handles.

=item 2.

A reference to an anonymous array can be created using square
brackets:

    $arrayref = [1, 2, ['a', 'b', 'c']];

Here we've created a reference to an anonymous array of three elements
whose final element is itself a reference to another anonymous array of three
elements.  (The multidimensional syntax described later can be used to
access this.  For example, after the above, C<< $arrayref->[2][1] >> would have
the value "b".)

Taking a reference to an enumerated list is not the same
as using square brackets--instead it's the same as creating
a list of references!

    @list = (\$a, \@b, \%c);
    @list = \($a, @b, %c);	# same thing!

As a special case, C<\(@foo)> returns a list of references to the contents
of C<@foo>, not a reference to C<@foo> itself.  Likewise for C<%foo>,
except that the key references are to copies (since the keys are just
strings rather than full-fledged scalars).

=item 3.

A reference to an anonymous hash can be created using curly
brackets:

    $hashref = {
	'Adam'  => 'Eve',
	'Clyde' => 'Bonnie',
    };

Anonymous hash and array composers like these can be intermixed freely to
produce as complicated a structure as you want.  The multidimensional
syntax described below works for these too.  The values above are
literals, but variables and expressions would work just as well, because
assignment operators in Perl (even within local() or my()) are executable
statements, not compile-time declarations.

Because curly brackets (braces) are used for several other things
including BLOCKs, you may occasionally have to disambiguate braces at the
beginning of a statement by putting a C<+> or a C<return> in front so
that Perl realizes the opening brace isn't starting a BLOCK.  The economy and
mnemonic value of using curlies is deemed worth this occasional extra
hassle.

For example, if you wanted a function to make a new hash and return a
reference to it, you have these options:

    sub hashem {        { @_ } }   # silently wrong
    sub hashem {       +{ @_ } }   # ok
    sub hashem { return { @_ } }   # ok

On the other hand, if you want the other meaning, you can do this:

    sub showem {        { @_ } }   # ambiguous (currently ok, but may change)
    sub showem {       {; @_ } }   # ok
    sub showem { { return @_ } }   # ok

The leading C<+{> and C<{;> always serve to disambiguate
the expression to mean either the HASH reference, or the BLOCK.

=item 4.

A reference to an anonymous subroutine can be created by using
C<sub> without a subname:

    $coderef = sub { print "Boink!\n" };

Note the semicolon.  Except for the code
inside not being immediately executed, a C<sub {}> is not so much a
declaration as it is an operator, like C<do{}> or C<eval{}>.  (However, no
matter how many times you execute that particular line (unless you're in an
C<eval("...")>), $coderef will still have a reference to the I<same>
anonymous subroutine.)

Anonymous subroutines act as closures with respect to my() variables,
that is, variables lexically visible within the current scope.  Closure
is a notion out of the Lisp world that says if you define an anonymous
function in a particular lexical context, it pretends to run in that
context even when it's called outside the context.

In human terms, it's a funny way of passing arguments to a subroutine when
you define it as well as when you call it.  It's useful for setting up
little bits of code to run later, such as callbacks.  You can even
do object-oriented stuff with it, though Perl already provides a different
mechanism to do that--see L<perlobj>.

You might also think of closure as a way to write a subroutine
template without using eval().  Here's a small example of how
closures work:

    sub newprint {
	my $x = shift;
	return sub { my $y = shift; print "$x, $y!\n"; };
    }
    $h = newprint("Howdy");
    $g = newprint("Greetings");

    # Time passes...

    &$h("world");
    &$g("earthlings");

This prints

    Howdy, world!
    Greetings, earthlings!

Note particularly that $x continues to refer to the value passed
into newprint() I<despite> "my $x" having gone out of scope by the
time the anonymous subroutine runs.  That's what a closure is all
about.

This applies only to lexical variables, by the way.  Dynamic variables
continue to work as they have always worked.  Closure is not something
that most Perl programmers need trouble themselves about to begin with.

=item 5.

References are often returned by special subroutines called constructors.
Perl objects are just references to a special type of object that happens to know
which package it's associated with.  Constructors are just special
subroutines that know how to create that association.  They do so by
starting with an ordinary reference, and it remains an ordinary reference
even while it's also being an object.  Constructors are often
named new() and called indirectly:

    $objref = new Doggie (Tail => 'short', Ears => 'long');

But don't have to be:

    $objref   = Doggie->new(Tail => 'short', Ears => 'long');

    use Term::Cap;
    $terminal = Term::Cap->Tgetent( { OSPEED => 9600 });

    use Tk;
    $main    = MainWindow->new();
    $menubar = $main->Frame(-relief              => "raised",
                            -borderwidth         => 2)

=item 6.

References of the appropriate type can spring into existence if you
dereference them in a context that assumes they exist.  Because we haven't
talked about dereferencing yet, we can't show you any examples yet.

=item 7.

A reference can be created by using a special syntax, lovingly known as
the *foo{THING} syntax.  *foo{THING} returns a reference to the THING
slot in *foo (which is the symbol table entry which holds everything
known as foo).

    $scalarref = *foo{SCALAR};
    $arrayref  = *ARGV{ARRAY};
    $hashref   = *ENV{HASH};
    $coderef   = *handler{CODE};
    $ioref     = *STDIN{IO};
    $globref   = *foo{GLOB};

All of these are self-explanatory except for C<*foo{IO}>.  It returns
the IO handle, used for file handles (L<perlfunc/open>), sockets
(L<perlfunc/socket> and L<perlfunc/socketpair>), and directory
handles (L<perlfunc/opendir>).  For compatibility with previous
versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>.

C<*foo{THING}> returns undef if that particular THING hasn't been used yet,
except in the case of scalars.  C<*foo{SCALAR}> returns a reference to an
anonymous scalar if $foo hasn't been used yet.  This might change in a
future release.

C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in
L<perldata/"Typeglobs and Filehandles"> for passing filehandles
into or out of subroutines, or storing into larger data structures.
Its disadvantage is that it won't create a new filehandle for you.
Its advantage is that you have less risk of clobbering more than
you want to with a typeglob assignment.  (It still conflates file
and directory handles, though.)  However, if you assign the incoming
value to a scalar instead of a typeglob as we do in the examples
below, there's no risk of that happening.

    splutter(*STDOUT);		# pass the whole glob
    splutter(*STDOUT{IO});	# pass both file and dir handles

    sub splutter {
	my $fh = shift;
	print $fh "her um well a hmmm\n";
    }

    $rec = get_rec(*STDIN);	# pass the whole glob
    $rec = get_rec(*STDIN{IO}); # pass both file and dir handles

    sub get_rec {
	my $fh = shift;
	return scalar <$fh>;
    }

=back

=head2 Using References

That's it for creating references.  By now you're probably dying to
know how to use references to get back to your long-lost data.  There
are several basic methods.

=over 4

=item 1.

Anywhere you'd put an identifier (or chain of identifiers) as part
of a variable or subroutine name, you can replace the identifier with
a simple scalar variable containing a reference of the correct type:

    $bar = $$scalarref;
    push(@$arrayref, $filename);
    $$arrayref[0] = "January";
    $$hashref{"KEY"} = "VALUE";
    &$coderef(1,2,3);
    print $globref "output\n";

It's important to understand that we are specifically I<not> dereferencing
C<$arrayref[0]> or C<$hashref{"KEY"}> there.  The dereference of the
scalar variable happens I<before> it does any key lookups.  Anything more
complicated than a simple scalar variable must use methods 2 or 3 below.
However, a "simple scalar" includes an identifier that itself uses method
1 recursively.  Therefore, the following prints "howdy".

    $refrefref = \\\"howdy";
    print $$$$refrefref;

=item 2.

Anywhere you'd put an identifier (or chain of identifiers) as part of a
variable or subroutine name, you can replace the identifier with a
BLOCK returning a reference of the correct type.  In other words, the
previous examples could be written like this:

    $bar = ${$scalarref};
    push(@{$arrayref}, $filename);
    ${$arrayref}[0] = "January";
    ${$hashref}{"KEY"} = "VALUE";
    &{$coderef}(1,2,3);
    $globref->print("output\n");  # iff IO::Handle is loaded

Admittedly, it's a little silly to use the curlies in this case, but
the BLOCK can contain any arbitrary expression, in particular,
subscripted expressions:

    &{ $dispatch{$index} }(1,2,3);	# call correct routine

Because of being able to omit the curlies for the simple case of C<$$x>,
people often make the mistake of viewing the dereferencing symbols as
proper operators, and wonder about their precedence.  If they were,
though, you could use parentheses instead of braces.  That's not the case.
Consider the difference below; case 0 is a short-hand version of case 1,
I<not> case 2:

    $$hashref{"KEY"}   = "VALUE";	# CASE 0
    ${$hashref}{"KEY"} = "VALUE";	# CASE 1
    ${$hashref{"KEY"}} = "VALUE";	# CASE 2
    ${$hashref->{"KEY"}} = "VALUE";	# CASE 3

Case 2 is also deceptive in that you're accessing a variable
called %hashref, not dereferencing through $hashref to the hash
it's presumably referencing.  That would be case 3.

=item 3.

Subroutine calls and lookups of individual array elements arise often
enough that it gets cumbersome to use method 2.  As a form of
syntactic sugar, the examples for method 2 may be written:

    $arrayref->[0] = "January";   # Array element
    $hashref->{"KEY"} = "VALUE";  # Hash element
    $coderef->(1,2,3);            # Subroutine call

The left side of the arrow can be any expression returning a reference,
including a previous dereference.  Note that C<$array[$x]> is I<not> the
same thing as C<< $array->[$x] >> here:

    $array[$x]->{"foo"}->[0] = "January";

This is one of the cases we mentioned earlier in which references could
spring into existence when in an lvalue context.  Before this
statement, C<$array[$x]> may have been undefined.  If so, it's
automatically defined with a hash reference so that we can look up
C<{"foo"}> in it.  Likewise C<< $array[$x]->{"foo"} >> will automatically get
defined with an array reference so that we can look up C<[0]> in it.
This process is called I<autovivification>.

One more thing here.  The arrow is optional I<between> brackets
subscripts, so you can shrink the above down to

    $array[$x]{"foo"}[0] = "January";

Which, in the degenerate case of using only ordinary arrays, gives you
multidimensional arrays just like C's:

    $score[$x][$y][$z] += 42;

Well, okay, not entirely like C's arrays, actually.  C doesn't know how
to grow its arrays on demand.  Perl does.

=item 4.

If a reference happens to be a reference to an object, then there are
probably methods to access the things referred to, and you should probably
stick to those methods unless you're in the class package that defines the
object's methods.  In other words, be nice, and don't violate the object's
encapsulation without a very good reason.  Perl does not enforce
encapsulation.  We are not totalitarians here.  We do expect some basic
civility though.

=back

Using a string or number as a reference produces a symbolic reference,
as explained above.  Using a reference as a number produces an
integer representing its storage location in memory.  The only
useful thing to be done with this is to compare two references
numerically to see whether they refer to the same location.

    if ($ref1 == $ref2) {  # cheap numeric compare of references
	print "refs 1 and 2 refer to the same thing\n";
    }

Using a reference as a string produces both its referent's type,
including any package blessing as described in L<perlobj>, as well
as the numeric address expressed in hex.  The ref() operator returns
just the type of thing the reference is pointing to, without the
address.  See L<perlfunc/ref> for details and examples of its use.

The bless() operator may be used to associate the object a reference
points to with a package functioning as an object class.  See L<perlobj>.

A typeglob may be dereferenced the same way a reference can, because
the dereference syntax always indicates the type of reference desired.
So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable.

Here's a trick for interpolating a subroutine call into a string:

    print "My sub returned @{[mysub(1,2,3)]} that time.\n";

The way it works is that when the C<@{...}> is seen in the double-quoted
string, it's evaluated as a block.  The block creates a reference to an
anonymous array containing the results of the call to C<mysub(1,2,3)>.  So
the whole block returns a reference to an array, which is then
dereferenced by C<@{...}> and stuck into the double-quoted string. This
chicanery is also useful for arbitrary expressions:

    print "That yields @{[$n + 5]} widgets\n";

=head2 Symbolic references

We said that references spring into existence as necessary if they are
undefined, but we didn't say what happens if a value used as a
reference is already defined, but I<isn't> a hard reference.  If you
use it as a reference, it'll be treated as a symbolic
reference.  That is, the value of the scalar is taken to be the I<name>
of a variable, rather than a direct link to a (possibly) anonymous
value.

People frequently expect it to work like this.  So it does.

    $name = "foo";
    $$name = 1;			# Sets $foo
    ${$name} = 2;		# Sets $foo
    ${$name x 2} = 3;		# Sets $foofoo
    $name->[0] = 4;		# Sets $foo[0]
    @$name = ();		# Clears @foo
    &$name();			# Calls &foo() (as in Perl 4)
    $pack = "THAT";
    ${"${pack}::$name"} = 5;	# Sets $THAT::foo without eval

This is powerful, and slightly dangerous, in that it's possible
to intend (with the utmost sincerity) to use a hard reference, and
accidentally use a symbolic reference instead.  To protect against
that, you can say

    use strict 'refs';

and then only hard references will be allowed for the rest of the enclosing
block.  An inner block may countermand that with

    no strict 'refs';

Only package variables (globals, even if localized) are visible to
symbolic references.  Lexical variables (declared with my()) aren't in
a symbol table, and thus are invisible to this mechanism.  For example:

    local $value = 10;
    $ref = "value";
    {
	my $value = 20;
	print $$ref;
    }

This will still print 10, not 20.  Remember that local() affects package
variables, which are all "global" to the package.

=head2 Not-so-symbolic references

A new feature contributing to readability in perl version 5.001 is that the
brackets around a symbolic reference behave more like quotes, just as they
always have within a string.  That is,

    $push = "pop on ";
    print "${push}over";

has always meant to print "pop on over", even though push is
a reserved word.  This has been generalized to work the same outside
of quotes, so that

    print ${push} . "over";

and even

    print ${ push } . "over";

will have the same effect.  (This would have been a syntax error in
Perl 5.000, though Perl 4 allowed it in the spaceless form.)  This
construct is I<not> considered to be a symbolic reference when you're
using strict refs:

    use strict 'refs';
    ${ bareword };	# Okay, means $bareword.
    ${ "bareword" };	# Error, symbolic reference.

Similarly, because of all the subscripting that is done using single
words, we've applied the same rule to any bareword that is used for
subscripting a hash.  So now, instead of writing

    $array{ "aaa" }{ "bbb" }{ "ccc" }

you can write just

    $array{ aaa }{ bbb }{ ccc }

and not worry about whether the subscripts are reserved words.  In the
rare event that you do wish to do something like

    $array{ shift }

you can force interpretation as a reserved word by adding anything that
makes it more than a bareword:

    $array{ shift() }
    $array{ +shift }
    $array{ shift @_ }

The C<use warnings> pragma or the B<-w> switch will warn you if it
interprets a reserved word as a string.
But it will no longer warn you about using lowercase words, because the
string is effectively quoted.

=head2 Pseudo-hashes: Using an array as a hash

B<WARNING>:  This section describes an experimental feature.  Details may
change without notice in future versions.

Beginning with release 5.005 of Perl, you may use an array reference
in some contexts that would normally require a hash reference.  This
allows you to access array elements using symbolic names, as if they
were fields in a structure.

For this to work, the array must contain extra information.  The first
element of the array has to be a hash reference that maps field names
to array indices.  Here is an example:

    $struct = [{foo => 1, bar => 2}, "FOO", "BAR"];

    $struct->{foo};  # same as $struct->[1], i.e. "FOO"
    $struct->{bar};  # same as $struct->[2], i.e. "BAR"

    keys %$struct;   # will return ("foo", "bar") in some order
    values %$struct; # will return ("FOO", "BAR") in same some order

    while (my($k,$v) = each %$struct) {
       print "$k => $v\n";
    }

Perl will raise an exception if you try to access nonexistent fields.
To avoid inconsistencies, always use the fields::phash() function
provided by the C<fields> pragma.

    use fields;
    $pseudohash = fields::phash(foo => "FOO", bar => "BAR");

For better performance, Perl can also do the translation from field
names to array indices at compile time for typed object references.
See L<fields>.

There are two ways to check for the existence of a key in a
pseudo-hash.  The first is to use exists().  This checks to see if the
given field has ever been set.  It acts this way to match the behavior
of a regular hash.  For instance:

    use fields;
    $phash = fields::phash([qw(foo bar pants)], ['FOO']);
    $phash->{pants} = undef;

    print exists $phash->{foo};    # true, 'foo' was set in the declaration
    print exists $phash->{bar};    # false, 'bar' has not been used.
    print exists $phash->{pants};  # true, your 'pants' have been touched

The second is to use exists() on the hash reference sitting in the
first array element.  This checks to see if the given key is a valid
field in the pseudo-hash.

    print exists $phash->[0]{bar};	# true, 'bar' is a valid field
    print exists $phash->[0]{shoes};# false, 'shoes' can't be used

delete() on a pseudo-hash element only deletes the value corresponding
to the key, not the key itself.  To delete the key, you'll have to
explicitly delete it from the first hash element.

    print delete $phash->{foo};     # prints $phash->[1], "FOO"
    print exists $phash->{foo};     # false
    print exists $phash->[0]{foo};  # true, key still exists
    print delete $phash->[0]{foo};  # now key is gone
    print $phash->{foo};            # runtime exception

=head2 Function Templates

As explained above, a closure is an anonymous function with access to the
lexical variables visible when that function was compiled.  It retains
access to those variables even though it doesn't get run until later,
such as in a signal handler or a Tk callback.

Using a closure as a function template allows us to generate many functions
that act similarly.  Suppose you wanted functions named after the colors
that generated HTML font changes for the various colors:

    print "Be ", red("careful"), "with that ", green("light");

The red() and green() functions would be similar.  To create these,
we'll assign a closure to a typeglob of the name of the function we're
trying to build.  

    @colors = qw(red blue green yellow orange purple violet);
    for my $name (@colors) {
        no strict 'refs';	# allow symbol table manipulation
        *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" };
    } 

Now all those different functions appear to exist independently.  You can
call red(), RED(), blue(), BLUE(), green(), etc.  This technique saves on
both compile time and memory use, and is less error-prone as well, since
syntax checks happen at compile time.  It's critical that any variables in
the anonymous subroutine be lexicals in order to create a proper closure.
That's the reasons for the C<my> on the loop iteration variable.

This is one of the only places where giving a prototype to a closure makes
much sense.  If you wanted to impose scalar context on the arguments of
these functions (probably not a wise idea for this particular example),
you could have written it this way instead:

    *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" };

However, since prototype checking happens at compile time, the assignment
above happens too late to be of much use.  You could address this by
putting the whole loop of assignments within a BEGIN block, forcing it
to occur during compilation.

Access to lexicals that change over type--like those in the C<for> loop
above--only works with closures, not general subroutines.  In the general
case, then, named subroutines do not nest properly, although anonymous
ones do.  If you are accustomed to using nested subroutines in other
programming languages with their own private variables, you'll have to
work at it a bit in Perl.  The intuitive coding of this type of thing
incurs mysterious warnings about ``will not stay shared''.  For example,
this won't work:

    sub outer {
        my $x = $_[0] + 35;
        sub inner { return $x * 19 }   # WRONG
        return $x + inner();
    } 

A work-around is the following:

    sub outer {
        my $x = $_[0] + 35;
        local *inner = sub { return $x * 19 };
        return $x + inner();
    } 

Now inner() can only be called from within outer(), because of the
temporary assignments of the closure (anonymous subroutine).  But when
it does, it has normal access to the lexical variable $x from the scope
of outer().

This has the interesting effect of creating a function local to another
function, something not normally supported in Perl.

=head1 WARNING

You may not (usefully) use a reference as the key to a hash.  It will be
converted into a string:

    $x{ \$a } = $a;

If you try to dereference the key, it won't do a hard dereference, and
you won't accomplish what you're attempting.  You might want to do something
more like

    $r = \@a;
    $x{ $r } = $r;

And then at least you can use the values(), which will be
real refs, instead of the keys(), which won't.

The standard Tie::RefHash module provides a convenient workaround to this.

=head1 SEE ALSO

Besides the obvious documents, source code can be instructive.
Some pathological examples of the use of references can be found
in the F<t/op/ref.t> regression test in the Perl source directory.

See also L<perldsc> and L<perllol> for how to use references to create
complex data structures, and L<perltoot>, L<perlobj>, and L<perlbot>
for how to use them to create objects.
have.  An extra byte for a
tailing NUL is also reserved.  (SvPOK is not set for the SV even if string
space is allocated.)  The reference count for the new SV is set to 1.
C<id> is an integer id between 0 and 1299 (u                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
=head1 NAME

perlreftut - Mark's very short tutorial about references

=head1 DESCRIPTION

One of the most important new features in Perl 5 was the capability to
manage complicated data structures like multidimensional arrays and
nested hashes.  To enable these, Perl 5 introduced a feature called
`references', and using references is the key to managing complicated,
structured data in Perl.  Unfortunately, there's a lot of funny syntax
to learn, and the main manual page can be hard to follow.  The manual
is quite complete, and sometimes people find that a problem, because
it can be hard to tell what is important and what isn't.

Fortunately, you only need to know 10% of what's in the main page to get
90% of the benefit.  This page will show you that 10%.

=head1 Who Needs Complicated Data Structures?

One problem that came up all the time in Perl 4 was how to represent a
hash whose values were lists.  Perl 4 had hashes, of course, but the
values had to be scalars; they couldn't be lists.  

Why would you want a hash of lists?  Let's take a simple example: You
have a file of city and country names, like this:

	Chicago, USA
	Frankfurt, Germany
	Berlin, Germany
	Washington, USA
	Helsinki, Finland
	New York, USA

and you want to produce an output like this, with each country mentioned
once, and then an alphabetical list of the cities in that country:

	Finland: Helsinki.
	Germany: Berlin, Frankfurt.
	USA:  Chicago, New York, Washington.

The natural way to do this is to have a hash whose keys are country
names.  Associated with each country name key is a list of the cities in
that country.  Each time you read a line of input, split it into a country
and a city, look up the list of cities already known to be in that
country, and append the new city to the list.  When you're done reading
the input, iterate over the hash as usual, sorting each list of cities
before you print it out.

If hash values can't be lists, you lose.  In Perl 4, hash values can't
be lists; they can only be strings.  You lose.  You'd probably have to
combine all the cities into a single string somehow, and then when
time came to write the output, you'd have to break the string into a
list, sort the list, and turn it back into a string.  This is messy
and error-prone.  And it's frustrating, because Perl already has
perfectly good lists that would solve the problem if only you could
use them.

=head1 The Solution

By the time Perl 5 rolled around, we were already stuck with this
design: Hash values must be scalars.  The solution to this is
references.

A reference is a scalar value that I<refers to> an entire array or an
entire hash (or to just about anything else).  Names are one kind of
reference that you're already familiar with.  Think of the President:
a messy, inconvenient bag of blood and bones.  But to talk about him,
or to represent him in a computer program, all you need is the easy,
convenient scalar string "Bill Clinton".

References in Perl are like names for arrays and hashes.  They're
Perl's private, internal names, so you can be sure they're
unambiguous.  Unlike "Bill Clinton", a reference only refers to one
thing, and you always know what it refers to.  If you have a reference
to an array, you can recover the entire array from it.  If you have a
reference to a hash, you can recover the entire hash.  But the
reference is still an easy, compact scalar value.

You can't have a hash whose values are arrays; hash values can only be
scalars.  We're stuck with that.  But a single reference can refer to
an entire array, and references are scalars, so you can have a hash of
references to arrays, and it'll act a lot like a hash of arrays, and
it'll be just as useful as a hash of arrays.

We'll come back to this city-country problem later, after we've seen
some syntax for managing references.


=head1 Syntax

There are just two ways to make a reference, and just two ways to use
it once you have it.

=head2 Making References

B<Make Rule 1>

If you put a C<\> in front of a variable, you get a
reference to that variable.

    $aref = \@array;         # $aref now holds a reference to @array
    $href = \%hash;          # $href now holds a reference to %hash

Once the reference is stored in a variable like $aref or $href, you
can copy it or store it just the same as any other scalar value:

    $xy = $aref;             # $xy now holds a reference to @array
    $p[3] = $href;           # $p[3] now holds a reference to %hash
    $z = $p[3];              # $z now holds a reference to %hash


These examples show how to make references to variables with names.
Sometimes you want to make an array or a hash that doesn't have a
name.  This is analogous to the way you like to be able to use the
string C<"\n"> or the number 80 without having to store it in a named
variable first.

B<Make Rule 2>

C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
reference to that hash.

    $aref = [ 1, "foo", undef, 13 ];  
    # $aref now holds a reference to an array

    $href = { APR => 4, AUG => 8 };   
    # $href now holds a reference to a hash


The references you get from rule 2 are the same kind of
references that you get from rule 1:

	# This:
	$aref = [ 1, 2, 3 ];

	# Does the same as this:
	@array = (1, 2, 3);
	$aref = \@array;


The first line is an abbreviation for the following two lines, except
that it doesn't create the superfluous array variable C<@array>.


=head2 Using References

What can you do with a reference once you have it?  It's a scalar
value, and we've seen that you can store it as a scalar and get it back
again just like any scalar.  There are just two more ways to use it:

B<Use Rule 1>

If C<$aref> contains a reference to an array, then you
can put C<{$aref}> anywhere you would normally put the name of an
array.  For example, C<@{$aref}> instead of C<@array>.

Here are some examples of that:

Arrays:


	@a		@{$aref}		An array
	reverse @a	reverse @{$aref}	Reverse the array
	$a[3]		${$aref}[3]		An element of the array
	$a[3] = 17;	${$aref}[3] = 17	Assigning an element


On each line are two expressions that do the same thing.  The
left-hand versions operate on the array C<@a>, and the right-hand
versions operate on the array that is referred to by C<$aref>, but
once they find the array they're operating on, they do the same things
to the arrays.

Using a hash reference is I<exactly> the same:

	%h		%{$href}	      A hash
	keys %h		keys %{$href}	      Get the keys from the hash
	$h{'red'}	${$href}{'red'}	      An element of the hash
	$h{'red'} = 17	${$href}{'red'} = 17  Assigning an element


B<Use Rule 2>

C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >>
instead.

C<${$href}{red}> is too hard to read, so you can write
C<< $href->{red} >> instead.

Most often, when you have an array or a hash, you want to get or set a
single element from it.  C<${$aref}[3]> and C<${$href}{'red'}> have
too much punctuation, and Perl lets you abbreviate.

If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is
the fourth element of the array.  Don't confuse this with C<$aref[3]>,
which is the fourth element of a totally different array, one
deceptively named C<@aref>.  C<$aref> and C<@aref> are unrelated the
same way that C<$item> and C<@item> are.

Similarly, C<< $href->{'red'} >> is part of the hash referred to by
the scalar variable C<$href>, perhaps even one with no name.
C<$href{'red'}> is part of the deceptively named C<%href> hash.  It's
easy to forget to leave out the C<< -> >>, and if you do, you'll get
bizarre results when your program gets array and hash elements out of
totally unexpected hashes and arrays that weren't the ones you wanted
to use.


=head1 An Example

Let's see a quick example of how all this is useful.

First, remember that C<[1, 2, 3]> makes an anonymous array containing
C<(1, 2, 3)>, and gives you a reference to that array.

Now think about

	@a = ( [1, 2, 3],
               [4, 5, 6],
	       [7, 8, 9]
             );

@a is an array with three elements, and each one is a reference to
another array.

C<$a[1]> is one of these references.  It refers to an array, the array
containing C<(4, 5, 6)>, and because it is a reference to an array,
B<USE RULE 2> says that we can write C<< $a[1]->[2] >> to get the
third element from that array.  C<< $a[1]->[2] >> is the 6.
Similarly, C<< $a[0]->[1] >> is the 2.  What we have here is like a
two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get
or set the element in any row and any column of the array.

The notation still looks a little cumbersome, so there's one more
abbreviation:  

=head1 Arrow Rule

In between two B<subscripts>, the arrow is optional.

Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the
same thing.  Instead of C<< $a[0]->[1] >>, we can write C<$a[0][1]>;
it means the same thing.

Now it really looks like two-dimensional arrays!

You can see why the arrows are important.  Without them, we would have
had to write C<${$a[1]}[2]> instead of C<$a[1][2]>.  For
three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
the unreadable C<${${$x[2]}[3]}[5]>.


=head1 Solution

Here's the answer to the problem I posed earlier, of reformatting a
file of city and country names.

    1   while (<>) {
    2     chomp;
    3     my ($city, $country) = split /, /;
    4     push @{$table{$country}}, $city;
    5   }
    6
    7   foreach $country (sort keys %table) {
    8     print "$country: ";
    9     my @cities = @{$table{$country}};
   10     print join ', ', sort @cities;
   11     print ".\n";
   12	}


The program has two pieces:  Lines 1--5 read the input and build a
data structure, and lines 7--12 analyze the data and print out the
report.  

In the first part, line 4 is the important one.  We're going to have a
hash, C<%table>, whose keys are country names, and whose values are
(references to) arrays of city names.  After acquiring a city and
country name, the program looks up C<$table{$country}>, which holds (a
reference to) the list of cities seen in that country so far.  Line 4 is
totally analogous to

	push @array, $city;

except that the name C<array> has been replaced by the reference
C<{$table{$country}}>.  The C<push> adds a city name to the end of the
referred-to array.

In the second part, line 9 is the important one.  Again,
C<$table{$country}> is (a reference to) the list of cities in the country, so
we can recover the original list, and copy it into the array C<@cities>,
by using C<@{$table{$country}}>.  Line 9 is totally analogous to

	@cities = @array;

except that the name C<array> has been replaced by the reference
C<{$table{$country}}>.  The C<@> tells Perl to get the entire array.

The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>,
C<print>, and doesn't involve references at all.

There's one fine point I skipped.  Suppose the program has just read
the first line in its input that happens to mention Greece.
Control is at line 4, C<$country> is C<'Greece'>, and C<$city> is
C<'Athens'>.  Since this is the first city in Greece,
C<$table{$country}> is undefined---in fact there isn't an C<'Greece'> key
in C<%table> at all.  What does line 4 do here?

 4	push @{$table{$country}}, $city;


This is Perl, so it does the exact right thing.  It sees that you want
to push C<Athens> onto an array that doesn't exist, so it helpfully
makes a new, empty, anonymous array for you, installs it in the table,
and then pushes C<Athens> onto it.  This is called `autovivification'.


=head1 The Rest

I promised to give you 90% of the benefit with 10% of the details, and
that means I left out 90% of the details.  Now that you have an
overview of the important parts, it should be easier to read the
L<perlref> manual page, which discusses 100% of the details.

Some of the highlights of L<perlref>:

=over 4

=item *

You can make references to anything, including scalars, functions, and
other references.

=item *

In B<USE RULE 1>, you can omit the curly brackets whenever the thing
inside them is an atomic scalar variable like C<$aref>.  For example,
C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
C<${$aref}[1]>.  If you're just starting out, you may want to adopt
the habit of always including the curly brackets.

=item * 

To see if a variable contains a reference, use the `ref' function.
It returns true if its argument is a reference.  Actually it's a
little better than that:  It returns HASH for hash references and
ARRAY for array references.

=item * 

If you try to use a reference like a string, you get strings like

	ARRAY(0x80f5dec)   or    HASH(0x826afc0)

If you ever see a string that looks like this, you'll know you
printed out a reference by mistake.

A side effect of this representation is that you can use C<eq> to see
if two references refer to the same thing.  (But you should usually use
C<==> instead because it's much faster.)

=item *

You can use a string as if it were a reference.  If you use the string
C<"foo"> as an array reference, it's taken to be a reference to the
array C<@foo>.  This is called a I<soft reference> or I<symbolic reference>.

=back

You might prefer to go on to L<perllol> instead of L<perlref>; it
discusses lists of lists and multidimensional arrays in detail.  After
that, you should move on to L<perldsc>; it's a Data Structure Cookbook
that shows recipes for using and printing out arrays of hashes, hashes
of arrays, and other kinds of data.

=head1 Summary

Everyone needs compound data structures, and in Perl the way you get
them is with references.  There are four important rules for managing
references: Two for making references and two for using them.  Once
you know these rules you can do most of the important things you need
to do with references.

=head1 Credits

Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>)

This article originally appeared in I<The Perl Journal>
(http://tpj.com) volume 3, #2.  Reprinted with permission.  

The original title was I<Understand References Today>.

=head2 Distribution Conditions

Copyright 1998 The Perl Journal.

When included as part of the Standard Version of Perl, or as part of
its complete documentation whether printed or otherwise, this work may
be distributed only under the terms of Perl's Artistic License.  Any
distribution of this file or derivatives thereof outside of that
package require that special arrangements be made with copyright
holder.

Irrespective of its distribution, all code examples in these files are
hereby placed into the public domain.  You are permitted and
encouraged to use this code in your own programs for fun or for profit
as you see fit.  A simple comment in the code giving credit would be
courteous but is not required.




=cut
TMAGIC

Invokes C<mg_set> on an SV if it has 'set' magic.  This macro evaluates its
argument more than once.

	void	SvSETMAGIC(SV* sv)

=for hackers
Found in file sv.h

=item SvSetSV

Calls C<sv_setsv> if dsv is not the same as ssv.  May evaluate arguments
more than once.

	void	SvSetSV(SV* dsb, SV* ssv)

=for hackers
Found in file sv.h

=item SvSetSV_nosteal

Calls a non-destructive version of C<sv_setsv> if dsv is not the same as
ssv. May evaluate arguments more than                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlrequick - Perl regular expressions quick start

=head1 DESCRIPTION

This page covers the very basics of understanding, creating and
using regular expressions ('regexes') in Perl.


=head1 The Guide

=head2 Simple word matching

The simplest regex is simply a word, or more generally, a string of
characters.  A regex consisting of a word matches any string that
contains that word:

    "Hello World" =~ /World/;  # matches

In this statement, C<World> is a regex and the C<//> enclosing
C</World/> tells perl to search a string for a match.  The operator
C<=~> associates the string with the regex match and produces a true
value if the regex matched, or false if the regex did not match.  In
our case, C<World> matches the second word in C<"Hello World">, so the
expression is true.  This idea has several variations.

Expressions like this are useful in conditionals:

    print "It matches\n" if "Hello World" =~ /World/;

The sense of the match can be reversed by using C<!~> operator:

    print "It doesn't match\n" if "Hello World" !~ /World/;

The literal string in the regex can be replaced by a variable:

    $greeting = "World";
    print "It matches\n" if "Hello World" =~ /$greeting/;

If you're matching against C<$_>, the C<$_ =~> part can be omitted:

    $_ = "Hello World";
    print "It matches\n" if /World/;

Finally, the C<//> default delimiters for a match can be changed to
arbitrary delimiters by putting an C<'m'> out front:

    "Hello World" =~ m!World!;   # matches, delimited by '!'
    "Hello World" =~ m{World};   # matches, note the matching '{}'
    "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
                                 # '/' becomes an ordinary char

Regexes must match a part of the string I<exactly> in order for the
statement to be true:

    "Hello World" =~ /world/;  # doesn't match, case sensitive
    "Hello World" =~ /o W/;    # matches, ' ' is an ordinary char
    "Hello World" =~ /World /; # doesn't match, no ' ' at end

perl will always match at the earliest possible point in the string:

    "Hello World" =~ /o/;       # matches 'o' in 'Hello'
    "That hat is red" =~ /hat/; # matches 'hat' in 'That'

Not all characters can be used 'as is' in a match.  Some characters,
called B<metacharacters>, are reserved for use in regex notation.
The metacharacters are

    {}[]()^$.|*+?\

A metacharacter can be matched by putting a backslash before it:

    "2+2=4" =~ /2+2/;    # doesn't match, + is a metacharacter
    "2+2=4" =~ /2\+2/;   # matches, \+ is treated like an ordinary +
    'C:\WIN32' =~ /C:\\WIN/;                       # matches
    "/usr/bin/perl" =~ /\/usr\/local\/bin\/perl/;  # matches

In the last regex, the forward slash C<'/'> is also backslashed,
because it is used to delimit the regex.

Non-printable ASCII characters are represented by B<escape sequences>.
Common examples are C<\t> for a tab, C<\n> for a newline, and C<\r>
for a carriage return.  Arbitrary bytes are represented by octal
escape sequences, e.g., C<\033>, or hexadecimal escape sequences,
e.g., C<\x1B>:

    "1000\t2000" =~ m(0\t2)        # matches
    "cat"        =~ /\143\x61\x74/ # matches, but a weird way to spell cat

Regexes are treated mostly as double quoted strings, so variable
substitution works:

    $foo = 'house';
    'cathouse' =~ /cat$foo/;   # matches
    'housecat' =~ /${foo}cat/; # matches

With all of the regexes above, if the regex matched anywhere in the
string, it was considered a match.  To specify I<where> it should
match, we would use the B<anchor> metacharacters C<^> and C<$>.  The
anchor C<^> means match at the beginning of the string and the anchor
C<$> means match at the end of the string, or before a newline at the
end of the string.  Some examples:

    "housekeeper" =~ /keeper/;         # matches
    "housekeeper" =~ /^keeper/;        # doesn't match
    "housekeeper" =~ /keeper$/;        # matches
    "housekeeper\n" =~ /keeper$/;      # matches
    "housekeeper" =~ /^housekeeper$/;  # matches

=head2 Using character classes

A B<character class> allows a set of possible characters, rather than
just a single character, to match at a particular point in a regex.
Character classes are denoted by brackets C<[...]>, with the set of
characters to be possibly matched inside.  Here are some examples:

    /cat/;            # matches 'cat'
    /[bcr]at/;        # matches 'bat', 'cat', or 'rat'
    "abc" =~ /[cab]/; # matches 'a'

In the last statement, even though C<'c'> is the first character in
the class, the earliest point at which the regex can match is C<'a'>.

    /[yY][eE][sS]/; # match 'yes' in a case-insensitive way
                    # 'yes', 'Yes', 'YES', etc.
    /yes/i;         # also match 'yes' in a case-insensitive way

The last example shows a match with an C<'i'> B<modifier>, which makes
the match case-insensitive.

Character classes also have ordinary and special characters, but the
sets of ordinary and special characters inside a character class are
different than those outside a character class.  The special
characters for a character class are C<-]\^$> and are matched using an
escape:

   /[\]c]def/; # matches ']def' or 'cdef'
   $x = 'bcr';
   /[$x]at/;   # matches 'bat, 'cat', or 'rat'
   /[\$x]at/;  # matches '$at' or 'xat'
   /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'

The special character C<'-'> acts as a range operator within character
classes, so that the unwieldy C<[0123456789]> and C<[abc...xyz]>
become the svelte C<[0-9]> and C<[a-z]>:

    /item[0-9]/;  # matches 'item0' or ... or 'item9'
    /[0-9a-fA-F]/;  # matches a hexadecimal digit

If C<'-'> is the first or last character in a character class, it is
treated as an ordinary character.

The special character C<^> in the first position of a character class
denotes a B<negated character class>, which matches any character but
those in the brackets.  Both C<[...]> and C<[^...]> must match a
character, or the match fails.  Then

    /[^a]at/;  # doesn't match 'aat' or 'at', but matches
               # all other 'bat', 'cat, '0at', '%at', etc.
    /[^0-9]/;  # matches a non-numeric character
    /[a^]at/;  # matches 'aat' or '^at'; here '^' is ordinary

Perl has several abbreviations for common character classes:

=over 4

=item *

\d is a digit and represents [0-9]

=item *

\s is a whitespace character and represents [\ \t\r\n\f]

=item *

\w is a word character (alphanumeric or _) and represents [0-9a-zA-Z_]

=item *

\D is a negated \d; it represents any character but a digit [^0-9]

=item *

\S is a negated \s; it represents any non-whitespace character [^\s]

=item *

\W is a negated \w; it represents any non-word character [^\w]

=item *

The period '.' matches any character but "\n"

=back

The C<\d\s\w\D\S\W> abbreviations can be used both inside and outside
of character classes.  Here are some in use:

    /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
    /[\d\s]/;         # matches any digit or whitespace character
    /\w\W\w/;         # matches a word char, followed by a
                      # non-word char, followed by a word char
    /..rt/;           # matches any two chars, followed by 'rt'
    /end\./;          # matches 'end.'
    /end[.]/;         # same thing, matches 'end.'

The S<B<word anchor> > C<\b> matches a boundary between a word
character and a non-word character C<\w\W> or C<\W\w>:

    $x = "Housecat catenates house and cat";
    $x =~ /\bcat/;  # matches cat in 'catenates'
    $x =~ /cat\b/;  # matches cat in 'housecat'
    $x =~ /\bcat\b/;  # matches 'cat' at end of string

In the last example, the end of the string is considered a word
boundary.

=head2 Matching this or that

We can match match different character strings with the B<alternation>
metacharacter C<'|'>.  To match C<dog> or C<cat>, we form the regex
C<dog|cat>.  As before, perl will try to match the regex at the
earliest possible point in the string.  At each character position,
perl will first try to match the the first alternative, C<dog>.  If
C<dog> doesn't match, perl will then try the next alternative, C<cat>.
If C<cat> doesn't match either, then the match fails and perl moves to
the next position in the string.  Some examples:

    "cats and dogs" =~ /cat|dog|bird/;  # matches "cat"
    "cats and dogs" =~ /dog|cat|bird/;  # matches "cat"

Even though C<dog> is the first alternative in the second regex,
C<cat> is able to match earlier in the string.

    "cats"          =~ /c|ca|cat|cats/; # matches "c"
    "cats"          =~ /cats|cat|ca|c/; # matches "cats"

At a given character position, the first alternative that allows the
regex match to succeed wil be the one that matches. Here, all the
alternatives match at the first string position, so th first matches.

=head2 Grouping things and hierarchical matching

The B<grouping> metacharacters C<()> allow a part of a regex to be
treated as a single unit.  Parts of a regex are grouped by enclosing
them in parentheses.  The regex C<house(cat|keeper)> means match
C<house> followed by either C<cat> or C<keeper>.  Some more examples
are

    /(a|b)b/;    # matches 'ab' or 'bb'
    /(^a|b)c/;   # matches 'ac' at start of string or 'bc' anywhere

    /house(cat|)/;  # matches either 'housecat' or 'house'
    /house(cat(s|)|)/;  # matches either 'housecats' or 'housecat' or
                        # 'house'.  Note groups can be nested.

    "20" =~ /(19|20|)\d\d/;  # matches the null alternative '()\d\d',
                             # because '20\d\d' can't match

=head2 Extracting matches

The grouping metacharacters C<()> also allow the extraction of the
parts of a string that matched.  For each grouping, the part that
matched inside goes into the special variables C<$1>, C<$2>, etc.
They can be used just as ordinary variables:

    # extract hours, minutes, seconds
    $time =~ /(\d\d):(\d\d):(\d\d)/;  # match hh:mm:ss format
    $hours = $1;
    $minutes = $2;
    $seconds = $3;

In list context, a match C</regex/> with groupings will return the
list of matched values C<($1,$2,...)>.  So we could rewrite it as

    ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

If the groupings in a regex are nested, C<$1> gets the group with the
leftmost opening parenthesis, C<$2> the next opening parenthesis,
etc.  For example, here is a complex regex and the matching variables
indicated below it:

    /(ab(cd|ef)((gi)|j))/;
     1  2      34

Associated with the matching variables C<$1>, C<$2>, ... are
the B<backreferences> C<\1>, C<\2>, ...  Backreferences are
matching variables that can be used I<inside> a regex:

    /(\w\w\w)\s\1/; # find sequences like 'the the' in string

C<$1>, C<$2>, ... should only be used outside of a regex, and C<\1>,
C<\2>, ... only inside a regex.

=head2 Matching repetitions

The B<quantifier> metacharacters C<?>, C<*>, C<+>, and C<{}> allow us
to determine the number of repeats of a portion of a regex we
consider to be a match.  Quantifiers are put immediately after the
character, character class, or grouping that we want to specify.  They
have the following meanings:

=over 4

=item *

C<a?> = match 'a' 1 or 0 times

=item *

C<a*> = match 'a' 0 or more times, i.e., any number of times

=item *

C<a+> = match 'a' 1 or more times, i.e., at least once

=item *

C<a{n,m}> = match at least C<n> times, but not more than C<m>
times.

=item *

C<a{n,}> = match at least C<n> or more times

=item *

C<a{n}> = match exactly C<n> times

=back

Here are some examples:

    /[a-z]+\s+\d*/;  # match a lowercase word, at least some space, and
                     # any number of digits
    /(\w+)\s+\1/;    # match doubled words of arbitrary length
    $year =~ /\d{2,4}/;  # make sure year is at least 2 but not more
                         # than 4 digits
    $year =~ /\d{4}|\d{2}/;    # better match; throw out 3 digit dates

These quantifiers will try to match as much of the string as possible,
while still allowing the regex to match.  So we have

    $x = 'the cat in the hat';
    $x =~ /^(.*)(at)(.*)$/; # matches,
                            # $1 = 'the cat in the h'
                            # $2 = 'at'
                            # $3 = ''   (0 matches)

The first quantifier C<.*> grabs as much of the string as possible
while still having the regex match. The second quantifier C<.*> has
no string left to it, so it matches 0 times.

=head2 More matching

There are a few more things you might want to know about matching
operators.  In the code

    $pattern = 'Seuss';
    while (<>) {
        print if /$pattern/;
    }

perl has to re-evaluate C<$pattern> each time through the loop.  If
C<$pattern> won't be changing, use the C<//o> modifier, to only
perform variable substitutions once.  If you don't want any
substitutions at all, use the special delimiter C<m''>:

    $pattern = 'Seuss';
    m'$pattern'; # matches '$pattern', not 'Seuss'

The global modifier C<//g> allows the matching operator to match
within a string as many times as possible.  In scalar context,
successive matches against a string will have C<//g> jump from match
to match, keeping track of position in the string as it goes along.
You can get or set the position with the C<pos()> function.
For example,

    $x = "cat dog house"; # 3 words
    while ($x =~ /(\w+)/g) {
        print "Word is $1, ends at position ", pos $x, "\n";
    }

prints

    Word is cat, ends at position 3
    Word is dog, ends at position 7
    Word is house, ends at position 13

A failed match or changing the target string resets the position.  If
you don't want the position reset after failure to match, add the
C<//c>, as in C</regex/gc>.

In list context, C<//g> returns a list of matched groupings, or if
there are no groupings, a list of matches to the whole regex.  So

    @words = ($x =~ /(\w+)/g);  # matches,
                                # $word[0] = 'cat'
                                # $word[1] = 'dog'
                                # $word[2] = 'house'

=head2 Search and replace

Search and replace is performed using C<s/regex/replacement/modifiers>.
The C<replacement> is a Perl double quoted string that replaces in the
string whatever is matched with the C<regex>.  The operator C<=~> is
also used here to associate a string with C<s///>.  If matching
against C<$_>, the S<C<$_ =~> > can be dropped.  If there is a match,
C<s///> returns the number of substitutions made, otherwise it returns
false.  Here are a few examples:

    $x = "Time to feed the cat!";
    $x =~ s/cat/hacker/;   # $x contains "Time to feed the hacker!"
    $y = "'quoted words'";
    $y =~ s/^'(.*)'$/$1/;  # strip single quotes,
                           # $y contains "quoted words"

With the C<s///> operator, the matched variables C<$1>, C<$2>, etc.
are immediately available for use in the replacement expression. With
the global modifier, C<s///g> will search and replace all occurrences
of the regex in the string:

    $x = "I batted 4 for 4";
    $x =~ s/4/four/;   # $x contains "I batted four for 4"
    $x = "I batted 4 for 4";
    $x =~ s/4/four/g;  # $x contains "I batted four for four"

The evaluation modifier C<s///e> wraps an C<eval{...}> around the
replacement string and the evaluated result is substituted for the
matched substring.  Some examples:

    # reverse all the words in a string
    $x = "the cat in the hat";
    $x =~ s/(\w+)/reverse $1/ge;   # $x contains "eht tac ni eht tah"

    # convert percentage to decimal
    $x = "A 39% hit rate";
    $x =~ s!(\d+)%!$1/100!e;       # $x contains "A 0.39 hit rate"

The last example shows that C<s///> can use other delimiters, such as
C<s!!!> and C<s{}{}>, and even C<s{}//>.  If single quotes are used
C<s'''>, then the regex and replacement are treated as single quoted
strings.

=head2 The split operator

C<split /regex/, string> splits C<string> into a list of substrings
and returns that list.  The regex determines the character sequence
that C<string> is split with respect to.  For example, to split a
string into words, use

    $x = "Calvin and Hobbes";
    @word = split /\s+/, $x;  # $word[0] = 'Calvin'
                              # $word[1] = 'and'
                              # $word[2] = 'Hobbes'

To extract a comma-delimited list of numbers, use

    $x = "1.618,2.718,   3.142";
    @const = split /,\s*/, $x;  # $const[0] = '1.618'
                                # $const[1] = '2.718'
                                # $const[2] = '3.142'

If the empty regex C<//> is used, the string is split into individual
characters.  If the regex has groupings, then list produced contains
the matched substrings from the groupings as well:

    $x = "/usr/bin";
    @parts = split m!(/)!, $x;  # $parts[0] = ''
                                # $parts[1] = '/'
                                # $parts[2] = 'usr'
                                # $parts[3] = '/'
                                # $parts[4] = 'bin'

Since the first character of $x matched the regex, C<split> prepended
an empty initial element to the list.

=head1 BUGS

None.

=head1 SEE ALSO

This is just a quick start guide.  For a more in-depth tutorial on
regexes, see L<perlretut> and for the reference page, see L<perlre>.

=head1 AUTHOR AND COPYRIGHT

Copyright (c) 2000 Mark Kvale
All rights reserved.

This document may be distributed under the same terms as Perl itself.

=head2 Acknowledgments

The author would like to thank Mark-Jason Dominus, Tom Christiansen,
Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful
comments.

=cut

this function is experimental and may change or be
removed without notice.

	U8*	utf8_to_bytes(U8 *s, STRLEN *len)

=for hackers
Found in file utf8.c

=item utf8_to_uv

Returns the character value of the first character in the string C<s>
which is assumed to be in UTF8 encoding and no longer than C<curlen>;
C<retlen> will be set to the length, in bytes, of that c                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                =head1 NAME

perlretut - Perl regular expressions tutorial

=head1 DESCRIPTION

This page provides a basic tutorial on understanding, creating and
using regular expressions in Perl.  It serves as a complement to the
reference page on regular expressions L<perlre>.  Regular expressions
are an integral part of the C<m//>, C<s///>, C<qr//> and C<split>
operators and so this tutorial also overlaps with
L<perlop/"Regexp Quote-Like Operators"> and L<perlfunc/split>.

Perl is widely renowned for excellence in text processing, and regular
expressions are one of the big factors behind this fame.  Perl regular
expressions display an efficiency and flexibility unknown in most
other computer languages.  Mastering even the basics of regular
expressions will allow you to manipulate text with surprising ease.

What is a regular expression?  A regular expression is simply a string
that describes a pattern.  Patterns are in common use these days;
examples are the patterns typed into a search engine to find web pages
and the patterns used to list files in a directory, e.g., C<ls *.txt>
or C<dir *.*>.  In Perl, the patterns described by regular expressions
are used to search strings, extract desired parts of strings, and to
do search and replace operations.

Regular expressions have the undeserved reputation of being abstract
and difficult to understand.  Regular expressions are constructed using
simple concepts like conditionals and loops and are no more difficult
to understand than the corresponding C<if> conditionals and C<while>
loops in the Perl language itself.  In fact, the main challenge in
learning regular expressions is just getting used to the terse
notation used to express these concepts.

This tutorial flattens the learning curve by discussing regular
expression concepts, along with their notation, one at a time and with
many examples.  The first part of the tutorial will progress from the
simplest word searches to the basic regular expression concepts.  If
you master the first part, you will have all the tools needed to solve
about 98% of your needs.  The second part of the tutorial is for those
comfortable with the basics and hungry for more power tools.  It
discusses the more advanced regular expression operators and
introduces the latest cutting edge innovations in 5.6.0.

A note: to save time, 'regular expression' is often abbreviated as
regexp or regex.  Regexp is a more natural abbreviation than regex, but
is harder to pronounce.  The Perl pod documentation is evenly split on
regexp vs regex; in Perl, there is more than one way to abbreviate it.
We'll use regexp in this tutorial.

=head1 Part 1: The basics

=head2 Simple word matching

The simplest regexp is simply a word, or more generally, a string of
characters.  A regexp consisting of a word matches any string that
contains that word:

    "Hello World" =~ /World/;  # matches

What is this perl statement all about? C<"Hello World"> is a simple
double quoted string.  C<World> is the regular expression and the
C<//> enclosing C</World/> tells perl to search a string for a match.
The operator C<=~> associates the string with the regexp match and
produces a true value if the regexp matched, or false if the regexp
did not match.  In our case, C<World> matches the second word in
C<"Hello World">, so the expression is true.  Expressions like this
are useful in conditionals:

    if ("Hello World" =~ /World/) {
        print "It matches\n";
    }
    else {
        print "It doesn't match\n";
    }

There are useful variations on this theme.  The sense of the match can
be reversed by using C<!~> operator:

    if ("Hello World" !~ /World/) {
        print "It doesn't match\n";
    }
    else {
        print "It matches\n";
    }

The literal string in the regexp can be replaced by a variable:

    $greeting = "World";
    if ("Hello World" =~ /$greeting/) {
        print "It matches\n";
    }
    else {
        print "It doesn't match\n";
    }

If you're matching against the special default variable C<$_>, the
C<$_ =~> part can be omitted:

    $_ = "Hello World";
    if (/World/) {
        print "It matches\n";
    }
    else {
        print "It doesn't match\n";
    }

And finally, the C<//> default delimiters for a match can be changed
to arbitrary delimiters by putting an C<'m'> out front:

    "Hello World" =~ m!World!;   # matches, delimited by '!'
    "Hello World" =~ m{World};   # matches, note the matching '{}'
    "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
                                 # '/' becomes an ordinary char

C</World/>, C<m!World!>, and C<m{World}> all represent the
same thing.  When, e.g., C<""> is used as a delimiter, the forward
slash C<'/'> becomes an ordinary character and can be used in a regexp
without trouble.

Let's consider how different regexps would match C<"Hello World">:

    "Hello World" =~ /world/;  # doesn't match
    "Hello World" =~ /o W/;    # matches
    "Hello World" =~ /oW/;     # doesn't match
    "Hello World" =~ /World /; # doesn't match

The first regexp C<world> doesn't match because regexps are
case-sensitive.  The second regexp matches because the substring
S<C<'o W'> > occurs in the string S<C<"Hello World"> >.  The space
character ' ' is treated like any other character in a regexp and is
needed to match in this case.  The lack of a space character is the
reason the third regexp C<'oW'> doesn't match.  The fourth regexp
C<'World '> doesn't match because there is a space at the end of the
regexp, but not at the end of the string.  The lesson here is that
regexps must match a part of the string I<exactly> in order for the
statement to be true.

If a regexp matches in more than one place in the string, perl will
always match at the earliest possible point in the string:

    "Hello World" =~ /o/;       # matches 'o' in 'Hello'
    "That hat is red" =~ /hat/; # matches 'hat' in 'That'

With respect to character matching, there are a few more points you
need to know about.   First of all, not all characters can be used 'as
is' in a match.  Some characters, called B<metacharacters>, are reserved
for use in regexp notation.  The metacharacters are

    {}[]()^$.|*+?\

The significance of each of these will be explained
in the rest of the tutorial, but for now, it is important only to know
that a metacharacter can be matched by putting a backslash before it:

    "2+2=4" =~ /2+2/;    # doesn't match, + is a metacharacter
    "2+2=4" =~ /2\+2/;   # matches, \+ is treated like an ordinary +
    "The interval is [0,1)." =~ /[0,1)./     # is a syntax error!
    "The interval is [0,1)." =~ /\[0,1\)\./  # matches
    "/usr/bin/perl" =~ /\/usr\/local\/bin\/perl/;  # matches

In the last regexp, the forward slash C<'/'> is also backslashed,
because it is used to delimit the regexp.  This can lead to LTS
(leaning toothpick syndrome), however, and it is often more readable
to change delimiters.


The backslash character C<'\'> is a metacharacter itself and needs to
be backslashed:

    'C:\WIN32' =~ /C:\\WIN/;   # matches

In addition to the metacharacters, there are some ASCII characters
which don't have printable character equivalents and are instead
represented by B<escape sequences>.  Common examples are C<\t> for a
tab, C<\n> for a newline, C<\r> for a carriage return and C<\a> for a
bell.  If your string is better thought of as a sequence of arbitrary
bytes, the octal escape sequence, e.g., C<\033>, or hexadecimal escape
sequence, e.g., C<\x1B> may be a more natural representation for your
bytes.  Here are some examples of escapes:

    "1000\t2000" =~ m(0\t2)   # matches
    "1000\n2000" =~ /0\n20/   # matches
    "1000\t2000" =~ /\000\t2/ # doesn't match, "0" ne "\000"
    "cat"        =~ /\143\x61\x74/ # matches, but a weird way to spell cat

If you've been around Perl a while, all this talk of escape sequences
may seem familiar.  Similar escape sequences are used in double-quoted
strings and in fact the regexps in Perl are mostly treated as
double-quoted strings.  This means that variables can be used in
regexps as well.  Just like double-quoted strings, the values of the
variables in the regexp will be substituted in before the regexp is
evaluated for matching purposes.  So we have:

    $foo = 'house';
    'housecat' =~ /$foo/;      # matches
    'cathouse' =~ /cat$foo/;   # matches
    'housecat' =~ /${foo}cat/; # matches

So far, so good.  With the knowledge above you can already perform
searches with just about any literal string regexp you can dream up.
Here is a I<very simple> emulation of the Unix grep program:

    % cat > simple_grep
    #!/usr/bin/perl
    $regexp = shift;
    while (<>) {
        print if /$regexp/;
    }
    ^D

    % chmod +x simple_grep

    % simple_grep abba /usr/dict/words
    Babbage
    cabbage
    cabbages
    sabbath
    Sabbathize
    Sabbathizes
    sabbatical
    scabbard
    scabbards

This program is easy to understand.  C<#!/usr/bin/perl> is the standard
way to invoke a perl program from the shell.
S<C<$regexp = shift;> > saves the first command line argument as the
regexp to be used, leaving the rest of the command line arguments to
be treated as files.  S<C<< while (<>) >> > loops over all the lines in
all the files.  For each line, S<C<print if /$regexp/;> > prints the
line if the regexp matches the line.  In this line, both C<print> and
C</$regexp/> use the default variable C<$_> implicitly.

With all of the regexps above, if the regexp matched anywhere in the
string, it was considered a match.  Sometimes, however, we'd like to
specify I<where> in the string the regexp should try to match.  To do
this, we would use the B<anchor> metacharacters C<^> and C<$>.  The
anchor C<^> means match at the beginning of the string and the anchor
C<$> means match at the end of the string, or before a newline at the
end of the string.  Here is how they are used:

    "housekeeper" =~ /keeper/;    # matches
    "housekeeper" =~ /^keeper/;   # doesn't match
    "housekeeper" =~ /keeper$/;   # matches
    "housekeeper\n" =~ /keeper$/; # matches

The second regexp doesn't match because C<^> constrains C<keeper> to
match only at the beginning of the string, but C<"housekeeper"> has
keeper starting in the middle.  The third regexp does match, since the
C<$> constrains C<keeper> to match only at the end of the string.

When both C<^> and C<$> are used at the same time, the regexp has to
match both the beginning and the end of the string, i.e., the regexp
matches the whole string.  Consider

    "keeper" =~ /^keep$/;      # doesn't match
    "keeper" =~ /^keeper$/;    # matches
    ""       =~ /^$/;          # ^$ matches an empty string

The first regexp doesn't match because the string has more to it than
C<keep>.  Since the second regexp is exactly the string, it
matches.  Using both C<^> and C<$> in a regexp forces the complete
string to match, so it gives you complete control over which strings
match and which don't.  Suppose you are looking for a fellow named
bert, off in a string by himself:

    "dogbert" =~ /bert/;   # matches, but not what you want

    "dilbert" =~ /^bert/;  # doesn't match, but ..
    "bertram" =~ /^bert/;  # matches, so still not good enough

    "bertram" =~ /^bert$/; # doesn't match, good
    "dilbert" =~ /^bert$/; # doesn't match, good
    "bert"    =~ /^bert$/; # matches, perfect

Of course, in the case of a literal string, one could just as easily
use the string equivalence S<C<$string eq 'bert'> > and it would be
more efficient.   The  C<^...$> regexp really becomes useful when we
add in the more powerful regexp tools below.

=head2 Using character classes

Although one can already do quite a lot with the literal string
regexps above, we've only scratched the surface of regular expression
technology.  In this and subsequent sections we will introduce regexp
concepts (and associated metacharacter notations) that will allow a
regexp to not just represent a single character sequence, but a I<whole
class> of them.

One such concept is that of a B<character class>.  A character class
allows a set of possible characters, rather than just a single
character, to match at a particular point in a regexp.  Character
classes are denoted by brackets C<[...]>, with the set of characters
to be possibly matched inside.  Here are some examples:

    /cat/;       # matches 'cat'
    /[bcr]at/;   # matches 'bat, 'cat', or 'rat'
    /item[0123456789]/;  # matches 'item0' or ... or 'item9'
    "abc" =~ /[cab]/;    # matches 'a'

In the last statement, even though C<'c'> is the first character in
the class, C<'a'> matches because the first character position in the
string is the earliest point at which the regexp can match.

    /[yY][eE][sS]/;      # match 'yes' in a case-insensitive way
                         # 'yes', 'Yes', 'YES', etc.

This regexp displays a common task: perform a a case-insensitive
match.  Perl provides away of avoiding all those brackets by simply
appending an C<'i'> to the end of the match.  Then C</[yY][eE][sS]/;>
can be rewritten as C</yes/i;>.  The C<'i'> stands for
case-insensitive and is an example of a B<modifier> of the matching
operation.  We will meet other modifiers later in the tutorial.

We saw in the section above that there were ordinary characters, which
represented themselves, and special characters, which needed a
backslash C<\> to represent themselves.  The same is true in a
character class, but the sets of ordinary and special characters
inside a character class are different than those outside a character
class.  The special characters for a character class are C<-]\^$>.  C<]>
is special because it denotes the end of a character class.  C<$> is
special because it denotes a scalar variable.  C<\> is special because
it is used in escape sequences, just like above.  Here is how the
special characters C<]$\> are handled:

   /[\]c]def/; # matches ']def' or 'cdef'
   $x = 'bcr';
   /[$x]at/;   # matches 'bat', 'cat', or 'rat'
   /[\$x]at/;  # matches '$at' or 'xat'
   /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'

The last two are a little tricky.  in C<[\$x]>, the backslash protects
the dollar sign, so the character class has two members C<$> and C<x>.
In C<[\\$x]>, the backslash is protected, so C<$x> is treated as a
variable and substituted in double quote fashion.

The special character C<'-'> acts as a range operator within character
classes, so that a contiguous set of characters can be written as a
range.  With ranges, the unwieldy C<[0123456789]> and C<[abc...xyz]>
become the svelte C<[0-9]> and C<[a-z]>.  Some examples are

    /item[0-9]/;  # matches 'item0' or ... or 'item9'
    /[0-9bx-z]aa/;  # matches '0aa', ..., '9aa',
                    # 'baa', 'xaa', 'yaa', or 'zaa'
    /[0-9a-fA-F]/;  # matches a hexadecimal digit
    /[0-9a-zA-Z_]/; # matches a "word" character,
                    # like those in a perl variable name

If C<'-'> is the first or last character in a character class, it is
treated as an ordinary character; C<[-ab]>, C<[ab-]> and C<[a\-b]> are
all equivalent.

The special character C<^> in the first position of a character class
denotes a B<negated character class>, which matches any character but
those in the brackets.  Both C<[...]> and C<[^...]> must match a
character, or the match fails.  Then

    /[^a]at/;  # doesn't match 'aat' or 'at', but matches
               # all other 'bat', 'cat, '0at', '%at', etc.
    /[^0-9]/;  # matches a non-numeric character
    /[a^]at/;  # matches 'aat' or '^at'; here '^' is ordinary

Now, even C<[0-9]> can be a bother the write multiple times, so in the
interest of saving keystrokes and making regexps more readable, Perl
has several abbreviations for common character classes:

=over 4

=item *

\d is a digit and represents [0-9]

=item *

\s is a whitespace character and represents [\ \t\r\n\f]

=item *

\w is a word character (alphanumeric or _) and represents [0-9a-zA-Z_]

=item *

\D is a negated \d; it represents any character but a digit [^0-9]

=item *

\S is a negated \s; it represents any non-whitespace character [^\s]

=item *

\W is a negated \w; it represents any non-word character [^\w]

=item *

The period '.' matches any character but "\n"

=back

The C<\d\s\w\D\S\W> abbreviations can be used both inside and outside
of character classes.  Here are some in use:

    /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
    /[\d\s]/;         # matches any digit or whitespace character
    /\w\W\w/;         # matches a word char, followed by a
                      # non-word char, followed by a word char
    /..rt/;           # matches any two chars, followed by 'rt'
    /end\./;          # matches 'end.'
    /end[.]/;         # same thing, matches 'end.'

Because a period is a metacharacter, it needs to be escaped to match
as an ordinary period. Because, for example, C<\d> and C<\w> are sets
of characters, it is incorrect to think of C<[^\d\w]> as C<[\D\W]>; in
fact C<[^\d\w]> is the same as C<[^\w]>, which is the same as
C<[\W]>. Think DeMorgan's laws.

An anchor useful in basic regexps is the S<B<word anchor> >
C<\b>.  This matches a boundary between a word character and a non-word
character C<\w\W> or C<\W\w>:

    $x = "Housecat catenates house and cat";
    $x =~ /cat/;    # matches cat in 'housecat'
    $x =~ /\bcat/;  # matches cat in 'catenates'
    $x =~ /cat\b/;  # matches cat in 'housecat'
    $x =~ /\bcat\b/;  # matches 'cat' at end of string

Note in the last example, the end of the string is considered a word
boundary.

You might wonder why C<'.'> matches everything but C<"\n"> - why not
every character? The reason is that often one is matching against
lines and would like to ignore the newline characters.  For instance,
while the string C<"\n"> represents one line, we would like to think
of as empty.  Then

    ""   =~ /^$/;    # matches
    "\n" =~ /^$/;    # matches, "\n" is ignored

    ""   =~ /./;      # doesn't match; it needs a char
    ""   =~ /^.$/;    # doesn't match; it needs a char
    "\n" =~ /^.$/;    # doesn't match; it needs a char other than "\n"
    "a"  =~ /^.$/;    # matches
    "a\n"  =~ /^.$/;  # matches, ignores the "\n"

This behavior is convenient, because we usually want to ignore
newlines when we count and match characters in a line.  Sometimes,
however, we want to keep track of newlines.  We might even want C<^>
and C<$> to anchor at the beginning and end of lines within the
string, rather than just the beginning and end of the string.  Perl
allows us to choose between ignoring and paying attention to newlines
by using the C<//s> and C<//m> modifiers.  C<//s> and C<//m> stand for
single line and multi-line and they determine whether a string is to
be treated as one continuous string, or as a set of lines.  The two
modifiers affect two aspects of how the regexp is interpreted: 1) how
the C<'.'> character class is defined, and 2) where the anchors C<^>
and C<$> are able to match.  Here are the four possible combinations:

=over 4

=item *

no modifiers (//): Default behavior.  C<'.'> matches any character
except C<"\n">.  C<^> matches only at the beginning of the string and
C<$> matches only at the end or before a newline at the end.

=item *

s modifier (//s): Treat string as a single long line.  C<'.'> matches
any character, even C<"\n">.  C<^> matches only at the beginning of
the string and C<$> matches only at the end or before a newline at the
end.

=item *

m modifier (//m): Treat string as a set of multiple lines.  C<'.'>
matches any character except C<"\n">.  C<^> and C<$> are able to match
at the start or end of I<any> line within the string.

=item *

both s and m modifiers (//sm): Treat string as a single long line, but
detect multiple lines.  C<'.'> matches any character, even
C<"\n">.  C<^> and C<$>, however, are able to match at the start or end
of I<any> line within the string.

=back

Here are examples of C<//s> and C<//m> in action:

    $x = "There once was a girl\nWho programmed in Perl\n";

    $x =~ /^Who/;   # doesn't match, "Who" not at start of string
    $x =~ /^Who/s;  # doesn't match, "Who" not at start of string
    $x =~ /^Who/m;  # matches, "Who" at start of second line
    $x =~ /^Who/sm; # matches, "Who" at start of second line

    $x =~ /girl.Who/;   # doesn't match, "." doesn't match "\n"
    $x =~ /girl.Who/s;  # matches, "." matches "\n"
    $x =~ /girl.Who/m;  # doesn't match, "." doesn't match "\n"
    $x =~ /girl.Who/sm; # matches, "." matches "\n"

Most of the time, the default behavior is what is want, but C<//s> and
C<//m> are occasionally very useful.  If C<//m> is being used, the start
of the string can still be matched with C<\A> and the end of string
can still be matched with the anchors C<\Z> (matches both the end and
the newline before, like C<$>), and C<\z> (matches only the end):

    $x =~ /^Who/m;   # matches, "Who" at start of second line
    $x =~ /\AWho/m;  # doesn't match, "Who" is not at start of string

    $x =~ /girl$/m;  # matches, "girl" at end of first line
    $x =~ /girl\Z/m; # doesn't match, "girl" is not at end of string

    $x =~ /Perl\Z/m; # matches, "Perl" is at newline before end
    $x =~ /Perl\z/m; # doesn't match, "Perl" is not at end of string

We now know how to create choices among classes of characters in a
regexp.  What about choices among words or character strings? Such
choices are described in the next section.

=head2 Matching this or that

Sometimes we would like to our regexp to be able to match different
possible words or character strings.  This is accomplished by using
the B<alternation> metacharacter C<|>.  To match C<dog> or C<cat>, we
form the regexp C<dog|cat>.  As before, perl will try to match the
regexp at the earliest possible point in the string.  At each
character position, perl will first try to match the first
alternative, C<dog>.  If C<dog> doesn't match, perl will then try the
next alternative, C<cat>.  If C<cat> doesn't match either, then the
match fails and perl moves to the next position in the string.  Some
examples:

    "cats and dogs" =~ /cat|dog|bird/;  # matches "cat"
    "cats and dogs" =~ /dog|cat|bird/;  # matches "cat"

Even though C<dog> is the first alternative in the second regexp,
C<cat> is able to match earlier in the string.

    "cats"          =~ /c|ca|cat|cats/; # matches "c"
    "cats"          =~ /cats|cat|ca|c/; # matches "cats"

Here, all the alternatives match at the first string position, so the
first alternative is the one that matches.  If some of the
alternatives are truncations of the others, put the longest ones first
to give them a chance to match.

    "cab" =~ /a|b|c/ # matches "c"
                     # /a|b|c/ == /[abc]/

The last example points out that character classes are like
alternations of characters.  At a given character position, the first
alternative that allows the regexp match to succeed wil be the one
that matches.

=head2 Grouping things and hierarchical matching

Alternation allows a regexp to choose among alternatives, but by
itself it unsatisfying.  The reason is that each alternative is a whole
regexp, but sometime we want alternatives for just part of a
regexp.  For instance, suppose we want to search for housecats or
housekeepers.  The regexp C<housecat|housekeeper> fits the bill, but is
inefficient because we had to type C<house> twice.  It would be nice to
have parts of the regexp be constant, like C<house>, and and some
parts have alternatives, like C<cat|keeper>.

The B<grouping> metacharacters C<()> solve this problem.  Grouping
allows parts of a regexp to be treated as a single unit.  Parts of a
regexp are grouped by enclosing them in parentheses.  Thus we could solve
the C<housecat|housekeeper> by forming the regexp as
C<house(cat|keeper)>.  The regexp C<house(cat|keeper)> means match
C<house> followed by either C<cat> or C<keeper>.  Some more examples
are

    /(a|b)b/;    # matches 'ab' or 'bb'
    /(ac|b)b/;   # matches 'acb' or 'bb'
    /(^a|b)c/;   # matches 'ac' at start of string or 'bc' anywhere
    /(a|[bc])d/; # matches 'ad', 'bd', or 'cd'

    /house(cat|)/;  # matches either 'housecat' or 'house'
    /house(cat(s|)|)/;  # matches either 'housecats' or 'housecat' or
                        # 'house'.  Note groups can be nested.

    /(19|20|)\d\d/;  # match years 19xx, 20xx, or the Y2K problem, xx
    "20" =~ /(19|20|)\d\d/;  # matches the null alternative '()\d\d',
                             # because '20\d\d' can't match

Alternations behave the same way in groups as out of them: at a given
string position, the leftmost alternative that allows the regexp to
match is taken.  So in the last example at tth first string position,
C<"20"> matches the second alternative, but there is nothing left over
to match the next two digits C<\d\d>.  So perl moves on to the next
alternative, which is the null alternative and that works, since
C<"20"> is two digits.

The process of trying one alternative, seeing if it matches, and
moving on to the next alternative if it doesn't, is called
B<backtracking>.  The term 'backtracking' comes from the idea that
matching a regexp is like a walk in the woods.  Successfully matching
a regexp is like arriving at a destination.  There are many possible
trailheads, one for each string position, and each one is tried in
order, left to right.  From each trailhead there may be many paths,
some of which get you there, and some which are dead ends.  When you
walk along a trail and hit a dead end, you have to backtrack along the
trail to an earlier point to try another trail.  If you hit your
destination, you stop immediately and forget about trying all the
other trails.  You are persistent, and only if you have tried all the
trails from all the trailheads and not arrived at your destination, do
you declare failure.  To be concrete, here is a step-by-step analysis
of what perl does when it tries to match the regexp

    "abcde" =~ /(abd|abc)(df|d|de)/;

=over 4

=item 0

Start with the first letter in the string 'a'.

=item 1

Try the first alternative in the first group 'abd'.

=item 2

Match 'a' followed by 'b'. So far so good.

=item 3

'd' in the regexp doesn't match 'c' in the string - a dead
end.  So backtrack two characters and pick the second alternative in
the first group 'abc'.

=item 4

Match 'a' followed by 'b' followed by 'c'.  We are on a roll
and have satisfied the first group. Set $1 to 'abc'.

=item 5

Move on to the second group and pick the first alternative
'df'.

=item 6

Match the 'd'.

=item 7

'f' in the regexp doesn't match 'e' in the string, so a dead
end.  Backtrack one character and pick the second alternative in the
second group 'd'.

=item 8

'd' matches. The second grouping is satisfied, so set $2 to
'd'.

=item 9

We are at the end of the regexp, so we are done! We have
matched 'abcd' out of the string "abcde".

=back

There are a couple of things to note about this analysis.  First, the
third alternative in the second group 'de' also allows a match, but we
stopped before we got to it - at a given character position, leftmost
wins.  Second, we were able to get a match at the first character
position of the string 'a'.  If there were no matches at the first
position, perl would move to the second character position 'b' and
attempt the match all over again.  Only when all possible paths at all
possible character positions have been exhausted does perl give give
up and declare S<C<$string =~ /(abd|abc)(df|d|de)/;> > to be false.

Even with all this work, regexp matching happens remarkably fast.  To
speed things up, during compilation stage, perl compiles the regexp
into a compact sequence of opcodes that can often fit inside a
processor cache.  When the code is executed, these opcodes can then run
at full throttle and search very quickly.

=head2 Extracting matches

The grouping metacharacters C<()> also serve another completely
different function: they allow the extraction of the parts of a string
that matched.  This is very useful to find out what matched and for
text processing in general.  For each grouping, the part that matched
inside goes into the special variables C<$1>, C<$2>, etc.  They can be
used just as ordinary variables:

    # extract hours, minutes, seconds
    $time =~ /(\d\d):(\d\d):(\d\d)/;  # match hh:mm:ss format
    $hours = $1;
    $minutes = $2;
    $seconds = $3;

Now, we know that in scalar context,
S<C<$time =~ /(\d\d):(\d\d):(\d\d)/> > returns a true or false
value.  In list context, however, it returns the list of matched values
C<($1,$2,$3)>.  So we could write the code more compactly as

    # extract hours, minutes, seconds
    ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

If the groupings in a regexp are nested, C<$1> gets the group with the
leftmost opening parenthesis, C<$2> the next opening parenthesis,
etc.  For example, here is a complex regexp and the matching variables
indicated below it:

    /(ab(cd|ef)((gi)|j))/;
     1  2      34

so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'.
For convenience, perl sets C<$+> to the highest numbered C<$1>, C<$2>,
... that got assigned.

Closely associated with the matching variables C<$1>, C<$2>, ... are
the B<backreferences> C<\1>, C<\2>, ... .  Backreferences are simply
matching variables that can be used I<inside> a regexp.  This is a
really nice feature - what matches later in a regexp can depend on
what matched earlier in the regexp.  Suppose we wanted to look
for doubled words in text, like 'the the'.  The following regexp finds
all 3-letter doubles with a space in between:

    /(\w\w\w)\s\1/;

The grouping assigns a value to \1, so that the same 3 letter sequence
is used for both parts.  Here are some words with repeated parts:

    % simple_grep '^(\w\w\w\w|\w\w\w|\w\w|\w)\1$' /usr/dict/words
    beriberi
    booboo
    coco
    mama
    murmur
    papa

The regexp has a single grouping which considers 4-letter
combinations, then 3-letter combinations, etc.  and uses C<\1> to look for
a repeat.  Although C<$1> and C<\1> represent the same thing, care should be
taken to use matched variables C<$1>, C<$2>, ... only outside a regexp
and backreferences C<\1>, C<\2>, ... only inside a regexp; not doing
so may lead to surprising and/or undefined results.

In addition to what was matched, Perl 5.6.0 also provides the
positions of what was matched with the C<@-> and C<@+>
arrays. C<$-[0]> is the position of the start of the entire match and
C<$+[0]> is the position of the end. Similarly, C<$-[n]> is the
position of the start of the C<$n> match and C<$+[n]> is the position
of the end. If C<$n> is undefined, so are C<$-[n]> and C<$+[n]>. Then
this code

    $x = "Mmm...donut, thought Homer";
    $x =~ /^(Mmm|Yech)\.\.\.(donut|peas)/; # matches
    foreach $expr (1..$#-) {
        print "Match $expr: '${$expr}' at position ($-[$expr],$+[$expr])\n";
    }

prints

    Match 1: 'Mmm' at position (0,3)
    Match 2: 'donut' at position (6,11)

Even if there are no groupings in a regexp, it is still possible to
find out what exactly matched in a string.  If you use them, perl
will set C<$`> to the part of the string before the match, will set C<$&>
to the part of the string that matched, and will set C<$'> to the part
of the string after the match.  An example:

    $x = "the cat caught the mouse";
    $x =~ /cat/;  # $` = 'the ', $& = 'cat', $' = ' caught the mouse'
    $x =~ /the/;  # $` = '', $& = 'the', $' = ' cat caught the mouse'

In the second match, S<C<$` = ''> > because the regexp matched at the
first character position in the string and stopped, it never saw the
second 'the'.  It is important to note that using C<$`> and C<$'>
slows down regexp matching quite a bit, and C< $& > slows it down to a
lesser extent, because if they are used in one regexp in a program,
they are generated for <all> regexps in the program.  So if raw
performance is a goal of your application, they should be avoided.
If you need them, use C<@-> and C<@+> instead:

    $` is the same as substr( $x, 0, $-[0] )
    $& is the same as substr( $x, $-[0], $+[0]-$-[0] )
    $' is the same as substr( $x, $+[0] )

=head2 Matching repetitions

The examples in the previous section display an annoying weakness.  We
were only matching 3-letter words, or syllables of 4 letters or
less.  We'd like to be able to match words or syllables of any length,
without writing out tedious alternatives like
C<\w\w\w\w|\w\w\w|\w\w|\w>.

This is exactly the problem the B<quantifier> metacharacters C<?>,
C<*>, C<+>, and C<{}> were created for.  They allow us to determine the
number of repeats of a portion of a regexp we consider to be a
match.  Quantifiers are put immediately after the character, character
class, or grouping that we want to specify.  They have the following
meanings:

=over 4

=item *

C<a?> = match 'a' 1 or 0 times

=item *

C<a*> = match 'a' 0 or more times, i.e., any number of times

=item *

C<a+> = match 'a' 1 or more times, i.e., at least once

=item *

C<a{n,m}> = match at least C<n> times, but not more than C<m>
times.

=item *

C<a{n,}> = match at least C<n> or more times

=item *

C<a{n}> = match exactly C<n> times

=back

Here are some examples:

    /[a-z]+\s+\d*/;  # match a lowercase word, at least some space, and
                     # any number of digits
    /(\w+)\s+\1/;    # match doubled words of arbitrary length
    /y(es)?/i;       # matches 'y', 'Y', or a case-insensitive 'yes'
    $year =~ /\d{2,4}/;  # make sure year is at least 2 but not more
                         # than 4 digits
    $year =~ /\d{4}|\d{2}/;    # better match; throw out 3 digit dates
    $year =~ /\d{2}(\d{2})?/;  # same thing written differently. However,
                               # this produces $1 and the other does not.

    % simple_grep '^(\w+)\1$' /usr/dict/words   # isn't this easier?
    beriberi
    booboo
    coco
    mama
    murmur
    papa

For all of these quantifiers, perl will try to match as much of the
string as possible, while still allowing the regexp to succeed.  Thus
with C</a?.../>, perl will first try to match the regexp with the C<a>
present; if that fails, perl will try to match the regexp without the
C<a> present.  For the quantifier C<*>, we get the following:

    $x = "the cat in the hat";
    $x =~ /^(.*)(cat)(.*)$/; # matches,
                             # $1 = 'the '
                             # $2 = 'cat'
                             # $3 = ' in the hat'

Which is what we might expect, the match finds the only C<cat> in the
string and locks onto it.  Consider, however, this regexp:

    $x =~ /^(.*)(at)(.*)$/; # matches,
                            # $1 = 'the cat in the h'
                            # $2 = 'at'
                            # $3 = ''   (0 matches)

One might initially guess that perl would find the C<at> in C<cat> and
stop there, but that wouldn't give the longest possible string to the
first quantifier C<.*>.  Instead, the first quantifier C<.*> grabs as
much of the string as possible while still having the regexp match.  In
this example, that means having the C<at> sequence with the final C<at>
in the string.  The other important principle illustrated here is that
when there are two or more elements in a regexp, the I<leftmost>
quantifier, if there is one, gets to grab as much the string as
possible, leaving the rest of the regexp to fight over scraps.  Thus in
our example, the first quantifier C<.*> grabs most of the string, while
the second quantifier C<.*> gets the empty string.   Quantifiers that
grab as much of the string as possible are called B<maximal match> or
B<greedy> quantifiers.

When a regexp can match a string in several different ways, we can use
the principles above to predict which way the regexp will match:

=over 4

=item *

Principle 0: Taken as a whole, any regexp will be matched at the
earliest possible position in the string.

=item *

Principle 1: In an alternation C<a|b|c...>, the leftmost alternative
that allows a match for the whole regexp will be the one used.

=item *

Principle 2: The maximal matching quantifiers C<?>, C<*>, C<+> and
C<{n,m}> will in general match as much of the string as possible while
still allowing the whole regexp to match.

=item *

Principle 3: If there are two or more elements in a regexp, the
leftmost greedy quantifier, if any, will match as much of the string
as possible while still allowing the whole regexp to match.  The next
leftmost greedy quantifier, if any, will try to match as much of the
string remaining available to it as possible, while still allowing the
whole regexp to match.  And so on, until all the regexp elements are
satisfied.

=back

As we have seen above, Principle 0 overrides the others - the regexp
will be matched as early as possible, with the other principles
determining how the regexp matches at that earliest character
position.

Here is an example of these principles in action:

    $x = "The programming republic of Perl";
    $x =~ /^(.+)(e|r)(.*)$/;  # matches,
                              # $1 = 'The programming republic of Pe'
                              # $2 = 'r'
                              # $3 = 'l'

This regexp matches at the earliest string position, C<'T'>.  One
might think that C<e>, being leftmost in the alternation, would be
matched, but C<r> produces the longest string in the first quantifier.

    $x =~ /(m{1,2})(.*)$/;  # matches,
                            # $1 = 'mm'
                            # $2 = 'ing republic of Perl'

Here, The earliest possible match is at the first C<'m'> in
C<programming>. C<m{1,2}> is the first quantifier, so it gets to match
a maximal C<mm>.

    $x =~ /.*(m{1,2})(.*)$/;  # matches,
                              # $1 = 'm'
                              # $2 = 'ing republic of Perl'

Here, the regexp matches at the start of the string. The first
quantifier C<.*> grabs as much as possible, leaving just a single
C<'m'> for the second quantifier C<m{1,2}>.

    $x =~ /(.?)(m{1,2})(.*)$/;  # matches,
                                # $1 = 'a'
                                # $2 = 'mm'
                                # $3 = 'ing republic of Perl'

Here, C<.?> eats its maximal one character at the earliest possible
position in the string, C<'a'> in C<programming>, leaving C<m{1,2}>
the opportunity to match both C<m>'s. Finally,

    "aXXXb" =~ /(X*)/; # matches with $1 = ''

because it can match zero copies of C<'X'> at the beginning of the
string.  If you definitely want to match at least one C<'X'>, use
C<X+>, not C<X*>.

Sometimes greed is not good.  At times, we would like quantifiers to
match a I<minimal> piece of string, rather than a maximal piece.  For
this purpose, Larry Wall created the S<B<minimal match> > or
B<non-greedy> quantifiers C<??>,C<*?>, C<+?>, and C<{}?>.  These are
the usual quantifiers with a C<?> appended to them.  They have the
following meanings:

=over 4

=item *

C<a??> = match 'a' 0 or 1 times. Try 0 first, then 1.

=item *

C<a*?> = match 'a' 0 or more times, i.e., any number of times,
but as few times as possible

=item *

C<a+?> = match 'a' 1 or more times, i.e., at least once, but
as few times as possible

=item *

C<a{n,m}?> = match at least C<n> times, not more than C<m>
times, as few times as possible

=item *

C<a{n,}?> = match at least C<n> times, but as few times as
possible

=item *

C<a{n}?> = match exactly C<n> times.  Because we match exactly
C<n> times, C<a{n}?> is equivalent to C<a{n}> and is just there for
notational consistency.

=back

Let's look at the example above, but with minimal quantifiers:

    $x = "The programming republic of Perl";
    $x =~ /^(.+?)(e|r)(.*)$/; # matches,
                              # $1 = 'Th'
                              # $2 = 'e'
                              # $3 = ' programming republic of Perl'

The minimal string that will allow both the start of the string C<^>
and the alternation to match is C<Th>, with the alternation C<e|r>
matching C<e>.  The second quantifier C<.*> is free to gobble up the
rest of the string.

    $x =~ /(m{1,2}?)(.*?)$/;  # matches,
                              # $1 = 'm'
                              # $2 = 'ming republic of Perl'

The first string position that this regexp can match is at the first
C<'m'> in C<programming>. At this position, the minimal C<m{1,2}?>
matches just one C<'m'>.  Although the second quantifier C<.*?> would
prefer to match no characters, it is constrained by the end-of-string
anchor C<$> to match the rest of the string.

    $x =~ /(.*?)(m{1,2}?)(.*)$/;  # matches,
                                  # $1 = 'The progra'
                                  # $2 = 'm'
                                  # $3 = 'ming republic of Perl'

In this regexp, you might expect the first minimal quantifier C<.*?>
to match the empty string, because it is not constrained by a C<^>
anchor to match the beginning of the word.  Principle 0 applies here,
however.  Because it is possible for the whole regexp to match at the
start of the string, it I<will> match at the start of the string.  Thus
the first quantifier has to match everything up to the first C<m>.  The
second minimal quantifier matches just one C<m> and the third
quantifier matches the rest of the string.

    $x =~ /(.??)(m{1,2})(.*)$/;  # matches,
                                 # $1 = 'a'
                                 # $2 = 'mm'
                                 # $3 = 'ing republic of Perl'

Just as in the previous regexp, the first quantifier C<.??> can match
earliest at position C<'a'>, so it does.  The second quantifier is
greedy, so it matches C<mm>, and the third matches the rest of the
string.

We can modify principle 3 above to take into account non-greedy
quantifiers:

=over 4

=item *

Principle 3: If there are two or more elements in a regexp, the
leftmost greedy (non-greedy) quantifier, if any, will match as much
(little) of the string as possible while still allowing the whole
regexp to match.  The next leftmost greedy (non-greedy) quantifier, if
any, will try to match as much (little) of the string remaining
available to it as possible, while still allowing the whole regexp to
match.  And so on, until all the regexp elements are satisfied.

=back

Just like alternation, quantifiers are also susceptible to
backtracking.  Here is a step-by-step analysis of the example

    $x = "the cat in the hat";
    $x =~ /^(.*)(at)(.*)$/; # matches,
                            # $1 = 'the cat in the h'
                            # $2 = 'at'
                            # $3 = ''   (0 matches)

=over 4

=item 0

Start with the first letter in the string 't'.

=item 1

The first quantifier '.*' starts out by matching the whole
string 'the cat in the hat'.

=item 2

'a' in the regexp element 'at' doesn't match the end of the
string.  Backtrack one character.

=item 3

'a' in the regexp element 'at' still doesn't match the last
letter of the string 't', so backtrack one more character.

=item 4

Now we can match the 'a' and the 't'.

=item 5

Move on to the third element '.*'.  Since we are at the end of
the string and '.*' can match 0 times, assign it the empty string.

=item 6

We are done!

=back

Most of the time, all this moving forward and backtracking happens
quickly and searching is fast.   There are some pathological regexps,
however, whose execution time exponentially grows with the size of the
string.  A typical structure that blows up in your face is of the form

    /(a|b+)*/;

The problem is the nested indeterminate quantifiers.  There are many
different ways of partitioning a string of length n between the C<+>
and C<*>: one repetition with C<b+> of length n, two repetitions with
the first C<b+> length k and the second with length n-k, m repetitions
whose bits add up to length n, etc.  In fact there are an exponential
number of ways to partition a string as a function of length.  A
regexp may get lucky and match early in the process, but if there is
no match, perl will try I<every> possibility before giving up.  So be
careful with nested C<*>'s, C<{n,m}>'s, and C<+>'s.  The book
I<Mastering regular expressions> by Jeffrey Friedl gives a wonderful
discussion of this and other efficiency issues.

=head2 Building a regexp

At this point, we have all the basic regexp concepts covered, so let's
give a more involved example of a regular expression.  We will build a
regexp that matches numbers.

The first task in building a regexp is to decide what we want to match
and what we want to exclude.  In our case, we want to match both
integers and floating point numbers and we want to reject any string
that isn't a number.

The next task is to break the problem down into smaller problems that
are easily converted into a regexp.

The simplest case is integers.  These consist of a sequence of digits,
with an optional sign in front.  The digits we can represent with
C<\d+> and the sign can be matched with C<[+-]>.  Thus the integer
regexp is

    /[+-]?\d+/;  # matches integers

A floating point number potentially has a sign, an integral part, a
decimal point, a fractional part, and an exponent.  One or more of these
parts is optional, so we need to check out the different
possibilities.  Floating point numbers which are in proper form include
123., 0.345, .34, -1e6, and 25.4E-72.  As with integers, the sign out
front is completely optional and can be matched by C<[+-]?>.  We can
see that if there is no exponent, floating point numbers must have a
decimal point, otherwise they are integers.  We might be tempted to
model these with C<\d*\.\d*>, but this would also match just a single
decimal point, which is not a number.  So the three cases of floating
point number sans exponent are

   /[+-]?\d+\./;  # 1., 321., etc.
   /[+-]?\.\d+/;  # .1, .234, etc.
   /[+-]?\d+\.\d+/;  # 1.0, 30.56, etc.

These can be combined into a single regexp with a three-way alternation:

   /[+-]?(\d+\.\d+|\d+\.|\.\d+)/;  # floating point, no exponent

In this alternation, it is important to put C<'\d+\.\d+'> before
C<'\d+\.'>.  If C<'\d+\.'> were first, the regexp would happily match that
and ignore the fractional part of the number.

Now consider floating point numbers with exponents.  The key
observation here is that I<both> integers and numbers with decimal
points are allowed in front of an exponent.  Then exponents, like the
overall sign, are independent of whether we are matching numbers with
or without decimal points, and can be 'decoupled' from the
mantissa.  The overall form of the regexp now becomes clear:

    /^(optional sign)(integer | f.p. mantissa)(optional exponent)$/;

The exponent is an C<e> or C<E>, followed by an integer.  So the
exponent regexp is

   /[eE][+-]?\d+/;  # exponent

Putting all the parts together, we get a regexp that matches numbers:

   /^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$/;  # Ta da!

Long regexps like this may impress your friends, but can be hard to
decipher.  In complex situations like this, the C<//x> modifier for a
match is invaluable.  It allows one to put nearly arbitrary whitespace
and comments into a regexp without affecting their meaning.  Using it,
we can rewrite our 'extended' regexp in the more pleasing form

   /^
      [+-]?         # first, match an optional sign
      (             # then match integers or f.p. mantissas:
          \d+\.\d+  # mantissa of the form a.b
         |\d+\.     # mantissa of the form a.
         |\.\d+     # mantissa of the form .b
         |\d+       # integer of the form a
      )
      ([eE][+-]?\d+)?  # finally, optionally match an exponent
   $/x;

If whitespace is mostly irrelevant, how does one include space
characters in an extended regexp? The answer is to backslash it
S<C<'\ '> > or put it in a character class S<C<[ ]> >.  The same thing
goes for pound signs, use C<\#> or C<[#]>.  For instance, Perl allows
a space between the sign and the mantissa/integer, and we could add
this to our regexp as follows:

   /^
      [+-]?\ *      # first, match an optional sign *and space*
      (             # then match integers or f.p. mantissas:
          \d+\.\d+  # mantissa of the form a.b
         |\d+\.     # mantissa of the form a.
         |\.\d+     # mantissa of the form .b
         |\d+       # integer of the form a
      )
      ([eE][+-]?\d+)?  # finally, optionally match an exponent
   $/x;

In this form, it is easier to see a way to simplify the
alternation.  Alternatives 1, 2, and 4 all start with C<\d+>, so it
could be factored out:

   /^
      [+-]?\ *      # first, match an optional sign
      (             # then match integers or f.p. mantissas:
          \d+       # start out with a ...
          (
              \.\d* # mantissa of the form a.b or a.
          )?        # ? takes care of integers of the form a
         |\.\d+     # mantissa of the form .b
      )
      ([eE][+-]?\d+)?  # finally, optionally match an exponent
   $/x;

or written in the compact form,

    /^[+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?$/;

This is our final regexp.  To recap, we built a regexp by

=over 4

=item *

specifying the task in detail,

=item *

breaking down the problem into smaller parts,

=item *

translating the small parts into regexps,

=item *

combining the regexps,

=item *

and optimizing the final combined regexp.

=back

These are also the typical steps involved in writing a computer
program.  This makes perfect sense, because regular expressions are
essentially programs written a little computer language that specifies
patterns.

=head2 Using regular expressions in Perl

The last topic of Part 1 briefly covers how regexps are used in Perl
programs.  Where do they fit into Perl syntax?

We have already introduced the matching operator in its default
C</regexp/> and arbitrary delimiter C<m!regexp!> forms.  We have used
the binding operator C<=~> and its negation C<!~> to test for string
matches.  Associated with the matching operator, we have discussed the
single line C<//s>, multi-line C<//m>, case-insensitive C<//i> and
extended C<//x> modifiers.

There are a few more things you might want to know about matching
operators.  First, we pointed out earlier that variables in regexps are
substituted before the regexp is evaluated:

    $pattern = 'Seuss';
    while (<>) {
        print if /$pattern/;
    }

This will print any lines containing the word C<Seuss>.  It is not as
efficient as it could be, however, because perl has to re-evaluate
C<$pattern> each time through the loop.  If C<$pattern> won't be
changing over the lifetime of the script, we can add the C<//o>
modifier, which directs perl to only perform variable substitutions
once:

    #!/usr/bin/perl
    #    Improved simple_grep
    $regexp = shift;
    while (<>) {
        print if /$regexp/o;  # a good deal faster
    }

If you change C<$pattern> after the first substitution happens, perl
will ignore it.  If you don't want any substitutions at all, use the
special delimiter C<m''>:

    $pattern = 'Seuss';
    while (<>) {
        print if m'$pattern';  # matches '$pattern', not 'Seuss'
    }

C<m''> acts like single quotes on a regexp; all other C<m> delimiters
act like double quotes.  If the regexp evaluates to the empty string,
the regexp in the I<last successful match> is used instead.  So we have

    "dog" =~ /d/;  # 'd' matches
    "dogbert =~ //;  # this matches the 'd' regexp used before

The final two modifiers C<//g> and C<//c> concern multiple matches.
The modifier C<//g> stands for global matching and allows the the
matching operator to match within a string as many times as possible.
In scalar context, successive invocations against a string will have
`C<//g> jump from match to match, keeping track of position in the
string as it goes along.  You can get or set the position with the
C<pos()> function.

The use of C<//g> is shown in the following example.  Suppose we have
a string that consists of words separated by spaces.  If we know how
many words there are in advance, we could extract the words using
groupings:

    $x = "cat dog house"; # 3 words
    $x =~ /^\s*(\w+)\s+(\w+)\s+(\w+)\s*$/; # matches,
                                           # $1 = 'cat'
                                           # $2 = 'dog'
                                           # $3 = 'house'

But what if we had an indeterminate number of words? This is the sort
of task C<//g> was made for.  To extract all words, form the simple
regexp C<(\w+)> and loop over all matches with C</(\w+)/g>:

    while ($x =~ /(\w+)/g) {
        print "Word is $1, ends at position ", pos $x, "\n";
    }

prints

    Word is cat, ends at position 3
    Word is dog, ends at position 7
    Word is house, ends at position 13

A failed match or changing the target string resets the position.  If
you don't want the position reset after failure to match, add the
C<//c>, as in C</regexp/gc>.  The current position in the string is
associated with the string, not the regexp.  This means that different
strings have different positions and their respective positions can be
set or read independently.

In list context, C<//g> returns a list of matched groupings, or if
there are no groupings, a list of matches to the whole regexp.  So if
we wanted just the words, we could use

    @words = ($x =~ /(\w+)/g);  # matches,
                                # $word[0] = 'cat'
                                # $word[1] = 'dog'
                                # $word[2] = 'house'

Closely associated with the C<//g> modifier is the C<\G> anchor.  The
C<\G> anchor matches at the point where the previous C<//g> match left
off.  C<\G> allows us to easily do context-sensitive matching:

    $metric = 1;  # use metric units
    ...
    $x = <FILE>;  # read in measurement
    $x =~ /^([+-]?\d+)\s*/g;  # get magnitude
    $weight = $1;
    if ($metric) { # error checking
        print "Units error!" unless $x =~ /\Gkg\./g;
    }
    else {
        print "Units error!" unless $x =~ /\Glbs\./g;
    }
    $x =~ /\G\s+(widget|sprocket)/g;  # continue processing

The combination of C<//g> and C<\G> allows us to process the string a
bit at a time and use arbitrary Perl logic to decide what to do next.

C<\G> is also invaluable in processing fixed length records with
regexps.  Suppose we have a snippet of coding region DNA, encoded as
base pair letters C<ATCGTTGAAT...> and we want to find all the stop
codons C<TGA>.  In a coding region, codons are 3-letter sequences, so
we can think of the DNA snippet as a sequence of 3-letter records.  The
naive regexp

    # expanded, this is "ATC GTT GAA TGC AAA TGA CAT GAC"
    $dna = "ATCGTTGAATGCAAATGACATGAC";
    $dna =~ /TGA/;

doesn't work; it may match an C<TGA>, but there is no guarantee that
the match is aligned with codon boundaries, e.g., the substring
S<C<GTT GAA> > gives a match.  A better solution is

    while ($dna =~ /(\w\w\w)*?TGA/g) {  # note the minimal *?
        print "Got a TGA stop codon at position ", pos $dna, "\n";
    }

which prints

    Got a TGA stop codon at position 18
    Got a TGA stop codon at position 23

Position 18 is good, but position 23 is bogus.  What happened?

The answer is that our regexp works well until we get past the last
real match.  Then the regexp will fail to match a synchronized C<TGA>
and start stepping ahead one character position at a time, not what we
want.  The solution is to use C<\G> to anchor the match to the codon
alignment:

    while ($dna =~ /\G(\w\w\w)*?TGA/g) {
        print "Got a TGA stop codon at position ", pos $dna, "\n";
    }

This prints

    Got a TGA stop codon at position 18

which is the correct answer.  This example illustrates that it is
important not only to match what is desired, but to reject what is not
desired.

B<search and replace>

Regular expressions also play a big role in B<search and replace>
operations in Perl.  Search and replace is accomplished with the
C<s///> operator.  The general form is
C<s/regexp/replacement/modifiers>, with everything we know about
regexps and modifiers applying in this case as well.  The
C<replacement> is a Perl double quoted string that replaces in the
string whatever is matched with the C<regexp>.  The operator C<=~> is
also used here to associate a string with C<s///>.  If matching
against C<$_>, the S<C<$_ =~> > can be dropped.  If there is a match,
C<s///> returns the number of substitutions made, otherwise it returns
false.  Here are a few examples:

    $x = "Time to feed the cat!";
    $x =~ s/cat/hacker/;   # $x contains "Time to feed the hacker!"
    if ($x =~ s/^(Time.*hacker)!$/$1 now!/) {
        $more_insistent = 1;
    }
    $y = "'quoted words'";
    $y =~ s/^'(.*)'$/$1/;  # strip single quotes,
                           # $y contains "quoted words"

In the last example, the whole string was matched, but only the part
inside the single quotes was grouped.  With the C<s///> operator, the
matched variables C<$1>, C<$2>, etc.  are immediately available for use
in the replacement expression, so we use C<$1> to replace the quoted
string with just what was quoted.  With the global modifier, C<s///g>
will search and replace all occurrences of the regexp in the string:

    $x = "I batted 4 for 4";
    $x =~ s/4/four/;   # doesn't do it all:
                       # $x contains "I batted four for 4"
    $x = "I batted 4 for 4";
    $x =~ s/4/four/g;  # does it all:
                       # $x contains "I batted four for four"

If you prefer 'regex' over 'regexp' in this tutorial, you could use
the following program to replace it:

    % cat > simple_replace
    #!/usr/bin/perl
    $regexp = shift;
    $replacement = shift;
    while (<>) {
        s/$regexp/$replacement/go;
        print;
    }
    ^D

    % simple_replace regexp regex perlretut.pod

In C<simple_replace> we used the C<s///g> modifier to replace all
occurrences of the regexp on each line and the C<s///o> modifier to
compile the regexp only once.  As with C<simple_grep>, both the
C<print> and the C<s/$regexp/$replacement/go> use C<$_> implicitly.

A modifier available specifically to search and replace is the
C<s///e> evaluation modifier.  C<s///e> wraps an C<eval{...}> around
the replacement string and the evaluated result is substituted for the
matched substring.  C<s///e> is useful if you need to do a bit of
computation in the process of replacing text.  This example counts
character frequencies in a line:

    $x = "Bill the cat";
    $x =~ s/(.)/$chars{$1}++;$1/eg;  # final $1 replaces char with itself
    print "frequency of '$_' is $chars{$_}\n"
        foreach (sort {$chars{$b} <=> $chars{$a}} keys %chars);

This prints

    frequency of ' ' is 2
    frequency of 't' is 2
    frequency of 'l' is 2
    frequency of 'B' is 1
    frequency of 'c' is 1
    frequency of 'e' is 1
    frequency of 'h' is 1
    frequency of 'i' is 1
    frequency of 'a' is 1

As with the match C<m//> operator, C<s///> can use other delimiters,
such as C<s!!!> and C<s{}{}>, and even C<s{}//>.  If single quotes are
used C<s'''>, then the regexp and replacement are treated as single
quoted strings and there are no substitutions.  C<s///> in list context
returns the same thing as in scalar context, i.e., the number of
matches.

B<The split operator>

The B<C<split> > function can also optionally use a matching operator
C<m//> to split a string.  C<split /regexp/, string, limit> splits
C<string> into a list of substrings and returns that list.  The regexp
is used to match the character sequence that the C<string> is split
with respect to.  The C<limit>, if present, constrains splitting into
no more than C<limit> number of strings.  For example, to split a
string into words, use

    $x = "Calvin and Hobbes";
    @words = split /\s+/, $x;  # $word[0] = 'Calvin'
                               # $word[1] = 'and'
                               # $word[2] = 'Hobbes'

If the empty regexp C<//> is used, the regexp always matches and
the string is split into individual characters.  If the regexp has
groupings, then list produced contains the matched substrings from the
groupings as well.  For instance,

    $x = "/usr/bin/perl";
    @dirs = split m!/!, $x;  # $dirs[0] = ''
                             # $dirs[1] = 'usr'
                             # $dirs[2] = 'bin'
                             # $dirs[3] = 'perl'
    @parts = split m!(/)!, $x;  # $parts[0] = ''
                                # $parts[1] = '/'
                                # $parts[2] = 'usr'
                                # $parts[3] = '/'
                                # $parts[4] = 'bin'
                                # $parts[5] = '/'
                                # $parts[6] = 'perl'

Since the first character of $x matched the regexp, C<split> prepended
an empty initial element to the list.

If you have read this far, congratulations! You now have all the basic
tools needed to use regular expressions to solve a wide range of text
processing problems.  If this is your first time through the tutorial,
why not stop here and play around with regexps a while...  S<Part 2>
concerns the more esoteric aspects of regular expressions and those
concepts certainly aren't needed right at the start.

=head1 Part 2: Power tools

OK, you know the basics of regexps and you want to know more.  If
matching regular expressions is analogous to a walk in the woods, then
the tools discussed in Part 1 are analogous to topo maps and a
compass, basic tools we use all the time.  Most of the tools in part 2
are are analogous to flare guns and satellite phones.  They aren't used
too often on a hike, but when we are stuck, they can be invaluable.

What follows are the more advanced, less used, or sometimes esoteric
capabilities of perl regexps.  In Part 2, we will assume you are
comfortable with the basics and concentrate on the new features.

=head2 More on characters, strings, and character classes

There are a number of escape sequences and character classes that we
haven't covered yet.

There are several escape sequences that convert characters or strings
between upper and lower case.  C<\l> and C<\u> convert the next
character to lower or upper case, respectively:

    $x = "perl";
    $string =~ /\u$x/;  # matches 'Perl' in $string
    $x = "M(rs?|s)\\."; # note the double backslash
    $string =~ /\l$x/;  # matches 'mr.', 'mrs.', and 'ms.',

C<\L> and C<\U> converts a whole substring, delimited by C<\L> or
C<\U> and C<\E>, to lower or upper case:

    $x = "This word is in lower case:\L SHOUT\E";
    $x =~ /shout/;       # matches
    $x = "I STILL KEYPUNCH CARDS FOR MY 360"
    $x =~ /\Ukeypunch/;  # matches punch card string

If there is no C<\E>, case is converted until the end of the
string. The regexps C<\L\u$word> or C<\u\L$word> convert the first
character of C<$word> to uppercase and the rest of the characters to
lowercase.

Control characters can be escaped with C<\c>, so that a control-Z
character would be matched with C<\cZ>.  The escape sequence
C<\Q>...C<\E> quotes, or protects most non-alphabetic characters.   For
instance,

    $x = "\QThat !^*&%~& cat!";
    $x =~ /\Q!^*&%~&\E/;  # check for rough language

It does not protect C<$> or C<@>, so that variables can still be
substituted.

With the advent of 5.6.0, perl regexps can handle more than just the
standard ASCII character set.  Perl now supports B<Unicode>, a standard
for encoding the character sets from many of the world's written
languages.  Unicode does this by allowing characters to be more than
one byte wide.  Perl uses the UTF-8 encoding, in which ASCII characters
are still encoded as one byte, but characters greater than C<chr(127)>
may be stored as two or more bytes.

What does this mean for regexps? Well, regexp users don't need to know
much about perl's internal representation of strings.  But they do need
to know 1) how to represent Unicode characters in a regexp and 2) when
a matching operation will treat the string to be searched as a
sequence of bytes (the old way) or as a sequence of Unicode characters
(the new way).  The answer to 1) is that Unicode characters greater
than C<chr(127)> may be represented using the C<\x{hex}> notation,
with C<hex> a hexadecimal integer:

    use utf8;    # We will be doing Unicode processing
    /\x{263a}/;  # match a Unicode smiley face :)

Unicode characters in the range of 128-255 use two hexadecimal digits
with braces: C<\x{ab}>.  Note that this is different than C<\xab>,
which is just a hexadecimal byte with no Unicode
significance.

Figuring out the hexadecimal sequence of a Unicode character you want
or deciphering someone else's hexadecimal Unicode regexp is about as
much fun as programming in machine code.  So another way to specify
Unicode characters is to use the S<B<named character> > escape
sequence C<\N{name}>.  C<name> is a name for the Unicode character, as
specified in the Unicode standard.  For instance, if we wanted to
represent or match the astrological sign for the planet Mercury, we
could use

    use utf8;              # We will be doing Unicode processing
    use charnames ":full"; # use named chars with Unicode full names
    $x = "abc\N{MERCURY}def";
    $x =~ /\N{MERCURY}/;   # matches

One can also use short names or restrict names to a certain alphabet:

    use utf8;              # We will be doing Unicode processing

    use charnames ':full';
    print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";

    use charnames ":short";
    print "\N{greek:Sigma} is an upper-case sigma.\n";

    use charnames qw(greek);
    print "\N{sigma} is Greek sigma\n";

A list of full names is found in the file Names.txt in the
lib/perl5/5.6.0/unicode directory.

The answer to requirement 2), as of 5.6.0, is that if a regexp
contains Unicode characters, the string is searched as a sequence of
Unicode characters.  Otherwise, the string is searched as a sequence of
bytes.  If the string is being searched as a sequence of Unicode
characters, but matching a single byte is required, we can use the C<\C>
escape sequence.  C<\C> is a character class akin to C<.> except that
it matches I<any> byte 0-255.  So

    use utf8;              # We will be doing Unicode processing
    use charnames ":full"; # use named chars with Unicode full names
    $x = "a";
    $x =~ /\C/;  # matches 'a', eats one byte
    $x = "";
    $x =~ /\C/;  # doesn't match, no bytes to match
    $x = "\N{MERCURY}";  # two-byte Unicode character
    $x =~ /\C/;  # matches, but dangerous!

The last regexp matches, but is dangerous because the string
I<character> position is no longer synchronized to the string I<byte>
position.  This generates the warning 'Malformed UTF-8
character'.  C<\C> is best used for matching the binary data in strings
with binary data intermixed with Unicode characters.

Let us now discuss the rest of the character classes.  Just as with
Unicode characters, there are named Unicode character classes
represented by the C<\p{name}> escape sequence.  Closely associated is
the C<\P{name}> character class, which is the negation of the
C<\p{name}> class.  For example, to match lower and uppercase
characters,

    use utf8;              # We will be doing Unicode processing
    use charnames ":full"; # use named chars with Unicode full names
    $x = "BOB";
    $x =~ /^\p{IsUpper}/;   # matches, uppercase char class
    $x =~ /^\P{IsUpper}/;   # doesn't match, char class sans uppercase
    $x =~ /^\p{IsLower}/;   # doesn't match, lowercase char class
    $x =~ /^\P{IsLower}/;   # matches, char class sans lowercase

Here is the association between some Perl named classes and the
traditional Unicode classes:

    Perl class name  Unicode class name or regular expression

    IsAlpha          /^[LM]/
    IsAlnum          /^[LMN]/
    IsASCII          $code <= 127
    IsCntrl          /^C/
    IsBlank          $code =~ /^(0020|0009)$/ || /^Z[^lp]/
    IsDigit          Nd
    IsGraph          /^([LMNPS]|Co)/
    IsLower          Ll
    IsPrint          /^([LMNPS]|Co|Zs)/
    IsPunct          /^P/
    IsSpace          /^Z/ || ($code =~ /^(0009|000A|000B|000C|000D)$/
    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D)$/
    IsUpper          /^L[ut]/
    IsWord           /^[LMN]/ || $code eq "005F"
    IsXDigit         $code =~ /^00(3[0-9]|[46][1-6])$/

You can also use the official Unicode class names with the C<\p> and
C<\P>, like C<\p{L}> for Unicode 'letters', or C<\p{Lu}> for uppercase
letters, or C<\P{Nd}> for non-digits.  If a C<name> is just one
letter, the braces can be dropped.  For instance, C<\pM> is the
character class of Unicode 'marks'.

C<\X> is an abbreviation for a character class sequence that includes
the Unicode 'combining character sequences'.  A 'combining character
sequence' is a base character followed by any number of combining
characters.  An example of a combining character is an accent.   Using
the Unicode full names, e.g., S<C<A + COMBINING RING> > is a combining
character sequence with base character C<A> and combining character
S<C<COMBINING RING> >, which translates in Danish to A with the circle
atop it, as in the word Angstrom.  C<\X> is equivalent to C<\PM\pM*}>,
i.e., a non-mark followed by one or more marks.

As if all those classes weren't enough, Perl also defines POSIX style
character classes.  These have the form C<[:name:]>, with C<name> the
name of the POSIX class.  The POSIX classes are C<alpha>, C<alnum>,
C<ascii>, C<cntrl>, C<digit>, C<graph>, C<lower>, C<print>, C<punct>,
C<space>, C<upper>, and C<xdigit>, and two extensions, C<word> (a Perl
extension to match C<\w>), and C<blank> (a GNU extension).  If C<utf8>
is being used, then these classes are defined the same as their
corresponding perl Unicode classes: C<[:upper:]> is the same as
C<\p{IsUpper}>, etc.  The POSIX character classes, however, don't
require using C<utf8>.  The C<[:digit:]>, C<[:word:]>, and
C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s>
character classes.  To negate a POSIX class, put a C<^> in front of
the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under
C<utf8>, C<\P{IsDigit}>.  The Unicode and POSIX character classes can
be used just like C<\d>, both inside and outside of character classes:

    /\s+[abc[:digit:]xyz]\s*/;  # match a,b,c,x,y,z, or a digit
    /^=item\s[:digit:]/;        # match '=item',
                                # followed by a space and a digit
    use utf8;
    use charnames ":full";
    /\s+[abc\p{IsDigit}xyz]\s+/;  # match a,b,c,x,y,z, or a digit
    /^=item\s\p{IsDigit}/;        # match '=item',
                                  # followed by a space and a digit

Whew! That is all the rest of the characters and character classes.

=head2 Compiling and saving regular expressions

In Part 1 we discussed the C<//o> modifier, which compiles a regexp
just once.  This suggests that a compiled regexp is some data structure
that can be stored once and used again and again.  The regexp quote
C<qr//> does exactly that: C<qr/string/> compiles the C<string> as a
regexp and transforms the result into a form that can be assigned to a
variable:

    $reg = qr/foo+bar?/;  # reg contains a compiled regexp

Then C<$reg> can be used as a regexp:

    $x = "fooooba";
    $x =~ $reg;     # matches, just like /foo+bar?/
    $x =~ /$reg/;   # same thing, alternate form

C<$reg> can also be interpolated into a larger regexp:

    $x =~ /(abc)?$reg/;  # still matches

As with the matching operator, the regexp quote can use different
delimiters, e.g., C<qr!!>, C<qr{}> and C<qr~~>.  The single quote
delimiters C<qr''> prevent any interpolation from taking place.

Pre-compiled regexps are useful for creating dynamic matches that
don't need to be recompiled each time they are encountered.  Using
pre-compiled regexps, C<simple_grep> program can be expanded into a
program that matches multiple patterns:

    % cat > multi_grep
    #!/usr/bin/perl
    # multi_grep - match any of <number> regexps
    # usage: multi_grep <number> regexp1 regexp2 ... file1 file2 ...

    $number = shift;
    $regexp[$_] = shift foreach (0..$number-1);
    @compiled = map qr/$_/, @regexp;
    while ($line = <>) {
        foreach $pattern (@compiled) {
            if ($line =~ /$pattern/) {
                print $line;
                last;  # we matched, so move onto the next line
            }
        }
    }
    ^D

    % multi_grep 2 last for multi_grep
        $regexp[$_] = shift foreach (0..$number-1);
            foreach $pattern (@compiled) {
                    last;

Storing pre-compiled regexps in an array C<@compiled> allows us to
simply loop through the regexps without any recompilation, thus gaining
flexibility without sacrificing speed.

=head2 Embedding comments and modifiers in a regular expression

Starting with this section, we will be discussing Perl's set of
B<extended patterns>.  These are extensions to the traditional regular
expression syntax that provide powerful new tools for pattern
matching.  We have already seen extensions in the form of the minimal
matching constructs C<??>, C<*?>, C<+?>, C<{n,m}?>, and C<{n,}?>.  The
rest of the extensions below have the form C<(?char...)>, where the
C<char> is a character that determines the type of extension.

The first extension is an embedded comment C<(?#text)>.  This embeds a
comment into the regular expression without affecting its meaning.  The
comment should not have any closing parentheses in the text.  An
example is

    /(?# Match an integer:)[+-]?\d+/;

This style of commenting has been largely superseded by the raw,
freeform commenting that is allowed with the C<//x> modifier.

The modifiers C<//i>, C<//m>, C<//s>, and C<//x> can also embedded in
a regexp using C<(?i)>, C<(?m)>, C<(?s)>, and C<(?x)>.  For instance,

    /(?i)yes/;  # match 'yes' case insensitively
    /yes/i;     # same thing
    /(?x)(          # freeform version of an integer regexp
             [+-]?  # match an optional sign
             \d+    # match a sequence of digits
         )
    /x;

Embedded modifiers can have two important advantages over the usual
modifiers.  Embedded modifiers allow a custom set of modifiers to
I<each> regexp pattern.  This is great for matching an array of regexps
that must have different modifiers:

    $pattern[0] = '(?i)doctor';
    $pattern[1] = 'Johnson';
    ...
    while (<>) {
        foreach $patt (@pattern) {
            print if /$patt/;
        }
    }

The second advantage is that embedded modifiers only affect the regexp
inside the group the embedded modifier is contained in.  So grouping
can be used to localize the modifier's effects:

    /Answer: ((?i)yes)/;  # matches 'Answer: yes', 'Answer: YES', etc.

Embedded modifiers can also turn off any modifiers already present
by using, e.g., C<(?-i)>.  Modifiers can also be combined into
a single expression, e.g., C<(?s-i)> turns on single line mode and
turns off case insensitivity.

=head2 Non-capturing groupings

We noted in Part 1 that groupings C<()> had two distinct functions: 1)
group regexp elements together as a single unit, and 2) extract, or
capture, substrings that matched the regexp in the
grouping.  Non-capturing groupings, denoted by C<(?:regexp)>, allow the
regexp to be treated as a single unit, but don't extract substrings or
set matching variables C<$1>, etc.  Both capturing and non-capturing
groupings are allowed to co-exist in the same regexp.  Because there is
no extraction, non-capturing groupings are faster than capturing
groupings.  Non-capturing groupings are also handy for choosing exactly
which parts of a regexp are to be extracted to matching variables:

    # match a number, $1-$4 are set, but we only want $1
    /([+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)/;

    # match a number faster , only $1 is set
    /([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?)/;

    # match a number, get $1 = whole number, $2 = exponent
    /([+-]?\ *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE]([+-]?\d+))?)/;

Non-capturing groupings are also useful for removing nuisance
elements gathered from a split operation:

    $x = '12a34b5';
    @num = split /(a|b)/, $x;    # @num = ('12','a','34','b','5')
    @num = split /(?:a|b)/, $x;  # @num = ('12','34','5')

Non-capturing groupings may also have embedded modifiers:
C<(?i-m:regexp)> is a non-capturing grouping that matches C<regexp>
case insensitively and turns off multi-line mode.

=head2 Looking ahead and looking behind

This section concerns the lookahead and lookbehind assertions.  First,
a little background.

In Perl regular expressions, most regexp elements 'eat up' a certain
amount of string when they match.  For instance, the regexp element
C<[abc}]> eats up one character of the string when it matches, in the
sense that perl moves to the next character position in the string
after the match.  There are some elements, however, that don't eat up
characters (advance the character position) if they match.  The examples
we have seen so far are the anchors.  The anchor C<^> matches the
beginning of the line, but doesn't eat any characters.  Similarly, the
word boundary anchor C<\b> matches, e.g., if the character to the left
is a word character and the character to the right is a non-word
character, but it doesn't eat up any characters itself.  Anchors are
examples of 'zero-width assertions'.  Zero-width, because they consume
no characters, and assertions, because they test some property of the
string.  In the context of our walk in the woods analogy to regexp
matching, most regexp elements move us along a trail, but anchors have
us stop a moment and check our surroundings.  If the local environment
checks out, we can proceed forward.  But if the local environment
doesn't satisfy us, we must backtrack.

Checking the environment entails either looking ahead on the trail,
looking behind, or both.  C<^> looks behind, to see that there are no
characters before.  C<$> looks ahead, to see that there are no
characters after.  C<\b> looks both ahead and behind, to see if the
characters on either side differ in their 'word'-ness.

The lookahead and lookbehind assertions are generalizations of the
anchor concept.  Lookahead and lookbehind are zero-width assertions
that let us specify which characters we want to test for.  The
lookahead assertion is denoted by C<(?=regexp)> and the lookbehind
assertion is denoted by C<< (?<=fixed-regexp) >>.  Some examples are

    $x = "I catch the housecat 'Tom-cat' with catnip";
    $x =~ /cat(?=\s+)/;  # matches 'cat' in 'housecat'
    @catwords = ($x =~ /(?<=\s)cat\w+/g);  # matches,
                                           # $catwords[0] = 'catch'
                                           # $catwords[1] = 'catnip'
    $x =~ /\bcat\b/;  # matches 'cat' in 'Tom-cat'
    $x =~ /(?<=\s)cat(?=\s)/; # doesn't match; no isolated 'cat' in
                              # middle of $x

Note that the parentheses in C<(?=regexp)> and C<< (?<=regexp) >> are
non-capturing, since these are zero-width assertions.  Thus in the
second regexp, the substrings captured are those of the whole regexp
itself.  Lookahead C<(?=regexp)> can match arbitrary regexps, but
lookbehind C<< (?<=fixed-regexp) >> only works for regexps of fixed
width, i.e., a fixed number of characters long.  Thus
C<< (?<=(ab|bc)) >> is fine, but C<< (?<=(ab)*) >> is not.  The
negated versions of the lookahead and lookbehind assertions are
denoted by C<(?!regexp)> and C<< (?<!fixed-regexp) >> respectively.
They evaluate true if the regexps do I<not> match:

    $x = "foobar";
    $x =~ /foo(?!bar)/;  # doesn't match, 'bar' follows 'foo'
    $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
    $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'

=head2 Using independent subexpressions to prevent backtracking

The last few extended patterns in this tutorial are experimental as of
5.6.0.  Play with them, use them in some code, but don't rely on them
just yet for production code.

S<B<Independent subexpressions> > are regular expressions, in the
context of a larger regular expression, that function independently of
the larger regular expression.  That is, they consume as much or as
little of the string as they wish without regard for the ability of
the larger regexp to match.  Independent subexpressions are represented
by C<< (?>regexp) >>.  We can illustrate their behavior by first
considering an ordinary regexp:

    $x = "ab";
    $x =~ /a*ab/;  # matches

This obviously matches, but in the process of matching, the
subexpression C<a*> first grabbed the C<a>.  Doing so, however,
wouldn't allow the whole regexp to match, so after backtracking, C<a*>
eventually gave back the C<a> and matched the empty string.  Here, what
C<a*> matched was I<dependent> on what the rest of the regexp matched.

Contrast that with an independent subexpression:

    $x =~ /(?>a*)ab/;  # doesn't match!

The independent subexpression C<< (?>a*) >> doesn't care about the rest
of the regexp, so it sees an C<a> and grabs it.  Then the rest of the
regexp C<ab> cannot match.  Because C<< (?>a*) >> is independent, there
is no backtracking and and the independent subexpression does not give
up its C<a>.  Thus the match of the regexp as a whole fails.  A similar
behavior occurs with completely independent regexps:

    $x = "ab";
    $x =~ /a*/g;   # matches, eats an 'a'
    $x =~ /\Gab/g; # doesn't match, no 'a' available

Here C<//g> and C<\G> create a 'tag team' handoff of the string from
one regexp to the other.  Regexps with an independent subexpression are
much like this, with a handoff of the string to the independent
subexpression, and a handoff of the string back to the enclosing
regexp.

The ability of an independent subexpression to prevent backtracking
can be quite useful.  Suppose we want to match a non-empty string
enclosed in parentheses up to two levels deep.  Then the following
regexp matches:

